Kazem Jahanbakhsh, PhD
I'm a technology entrepreneur. I completed my PhD in computer science. Have academic background in machine learning, data mining, algorithm design, social networks analysis, and natural language processing. My research focus is in extracting interesting patterns/signals from big data which can be turned into valuable business/marketing actions. I was lucky to have Dr. Valerie King and Dr. Ali Shoja as my PhD supervisors.
Since 2012, I have been designing/implementing machine learning algorithms for predicting conversion rates for display ads. This is done by real-time analysis of a large volume of ad performance data shown on websites to users.
I have industrial/research experience in designing/implemening NLP models such as topic modeling, NER, sentiment analysis, and spam detection. Have experience in designing/implementing predictive models to forecast results of political elections and flu outbreaks by mining/analyzing unstructured data from online social networks such as Twitter.
I have experience in online fraud detection space where I was involved in designing/implementing machine learning algorithms to process people social profiles from Facebook, Twitter, Google+, and LinkedIn and detect fake/fraud digital identities.
You can view the list of my previous work in the areas of machine learning, data mining and natural language processing here: Google Scholar
You can view the latest version of my CV here: Kazem Jahanbakhsh.
I provide consulting to technology companies in Machine Learning, NLP, Data Science and Optimization areas. This includes helping companies solve their machine learning, data science and data problem challenges which impact their business bottom line. See my Linkedin profile to view a sample of companies that I advised in the past in different verticals. If you're interested, you can reach out and book me on Clarity platform.
Here is the word cloud of my research projects:
Link to: Contact information.
- We have ranked and published the list of Top 22 Python Programming Books for data scientists who want to learn Python.
- For Black Friday 2016, we have analyzed smart watches in the market to rank & publish the list of Top 7 Smart Watches for tech gadget lovers.
- Apache Spark is a fast and general-purpose cluster computing system which has significantly impacted the data analysis ecosystem. We have used our web scrapers with ML algorithms to publish the list of Top 8 Apache Spark Books for data engineers and data scientists.
- I have been asked by several people who wanted to start their careers in data science space to recommend them a list of data science books. I decided to implement an ML algorithm to generate the list of Top 30 Data Science Books.
- Exploiting the exponential property of social graphs play an important role in marketing products and content. So, we have published the list of Top 8 Social Marketing Books for social marketing experts.
- In digital space the content plays a crucial role in marketing. We have published a list of Top 7 Content Marketing Books for content marketing experts.
- Search Engine Optimization (SEO) for websites seems to be a black art. However, this is not true. There are principles on which modern search engines have been designed. We have published a list of Top 17 SEO & SEM Books for SEO experts.
- There are so many software engineering books in the market where they claim that they cover best practices in software development. This makes picking the righ book very hard for human. We decided to use Machine Learning to cut the noise and generate the list of Top 40 Software Engineering Books for software engineers.
- Scala is an object-oriented programming language. In addition to being object-oriented, Scala is also a functional language, and combines the best approaches to OO and functional programming. We have published the list of Top 16 Scala Books for modern software engineers.
- The Docker containers technology is growing very fast and has changed the deployment of applications on production. So, we have decided to use Machine Learning to generate a list of Top 12 Docker Books for DevOps engineers.
- In Oct 2016, we have published the list of Top 11 Growth Hacking Books. This is a list of great books where different entrepreneurs have shared their experience with building a viral product.
- In July 2016, we have published a great list for Top 20 Startup & Entrepreneurship Books for entrepreneurs.
- In July 2016, we published a short article on shortcomings of A/B testing in B2B space.
- Since June 2016 we have started scraping web and collecting live strem of tweets to cover political conversations on US 2016 presidential election. Check the analysis results here: US 2016 Election Trend
- In April 2016, our prediction paper for US 2012 election has been featured on Forbes: Social Media Predicts Leicester Cinderella Story In English Premier League
- In Feb 2016, we have implemented a basic Youtube scraper to collect comments posted on Youtube videos. Here is the link to the youtube crawler: youtube-crawler.
- In Dec 2015, we implemented a simple distributed web crawler using RabbitMQ. You can access the source code from its github repo: distributed-crawler.
- In Sep 2015, we published a list for top machine learning & data mining books. We've collected various signals using our crawlers (e.g. online reviews/ratings, price, author influence, etc) for 100's of ML/DM/NLP books and used the signals to discover the best books. The list helps data scientists & machine learning engineers find out the right book for their needs. Check the list here: Top ML/DM Books.
- In Feb 2015, I gave a talk at Plenty of Fish on using machine learning algorithms for computational advertising. Link to the event page: adtech talk @pof.
- In Oct 2014, we published an article explaining the mathematics behind Latent Dirichlet Allocation (LDA) model using Collapsed Gibbs Sampling technique. Also we pushed a Java implementation of LDA to github repository. Read the article from here: Implementing LDA Model using Collapsed Gibbs Sampling
- In Aug 2014, we wrote an article about how advertisers on Google AdWords can minimize their costs when advertising for known brand names like eBay or Amazon. You can read the article from here: You May Have Been Wasting your Money on Google AdWords
- In Aug 2014, I wrote an article on how advertising companies can use machine learning in order to maximize their return on investment (ROI). You can read the full article from here: Multi-Armed Bandit Algorithms and Online Advertising
- In July 2014, we open sourced our Java ML/NLP library. We used this library for mining tweets and building predictive models. The predictive models are used for analyzing elections and mining public opinions in social media. You can pull the code from github: Twitter Mining.
- I wrote an article on Deep Learning where I highlighted the challenges and benefits that DL brings to ML community. Read the article from here: Deep Learning: Challenges and Excitements
- I documented my notes from 2013 KDD Conference in Chicago. Read my post here: My Takes from KDD 2013
- In Nov 2013, I gave a talk on Using Machine Learning & Statistics To Predict The US Presidential Election at machine learning and data science meetups in Vancouver.
- I wrote a short article: "Who's a Data Scientist?" which describes my academic/industrial experiences in data analysis area.
- In March 2013, we wrote an article: "Tracking Social Media Trends and Their Influence on E-Commerce Markets" were we showed correlation between eBay consumers and social media trend (e.g. Twitter).
- I defended my PhD in August 2012. My thesis topic was on "Contact Prediction, Routing and Fast Information Spreading in Social Networks". You can download the pdf of my thesis from Jahanbakhsh_Kazem_PhD. You can also download my defence slides from phd slides.
Open Source Projects
- Snake Game AI: In July 2013, we attended the hackwithus event @Victoria and built a simple AI agent for Snake game.
- Image Hunt Game: In March 2013, we attended Mobify hackathon where we built a fun game with a purpose (GWAP) to tag images. Learn more about the game from here: Image Hunt.
- Van City Talks: In Feb 2013, we attended the Open Data Day Hackathon. We used Vancouver open data and built a web app to compute/show a quality life score for different regions of Vancouver city. Read more about this project here: Vancouver City Talks
- Flu Preditor App: In January 2013, we attended the Firefox OS App Day in Vancouver where we built a mobile web app to determine the likelihood a person catches the flu by collecting/analyzing tweets. Read more about this project here: Predicting Flu
- Real-Time Bus Tracking System: This was a project that we designed/implemented/demoed in AngelHack Hackathon in Seattle. It was a mobile crowdsensing system (iPhone/Android) for tracking bus locations in real-time by using machine learning algorithms. Click RTBTS to find more about this project.
- Predicting US 2012 Presidency Election using Twitter: This is an ongoing research project for analyzing/mining 2012 US election conversations in Twitter. The main goal is to test the possibility of predicting election results using political tweets. Read more about this project from Predicting US 2012 Election Results.
- Geo Crawler: This is a project for crawling and indexing places that are hard to be found by using Google map service. Click Geo Crawler to read more about this project.
- Twheat Map: A web application for showing a real-time map of geo-tagged tweets with their labels (positive/negative) computed by using a sentiment analysis algorithm. This application was implemented in Abebooks Hackathon 2012 event in Victoria. Click here to find more about this application.
- Mobile Social Trivia Game: a Twilio SMS powered trivia application developed in HackVan 2012 event in Vancouver. Enter a code and join a multi-player trivia SMS game. Click Trivia to find more about this project.
- K-means Clustering: a Python implementation of k-means algorithm. Click k-means to find more about the algorithm and download the code.
- Drinking-Fountain Finder App.: a web application which shows the closest drinking fountain to your current location. This application was developed in Open Data Hackaton event in Vancouver. Click Fountains to find more about this application.
- Social Community Detection: an implementation of Girvan-Newman community detection algorithm for weighted graphs in Python. You can find more about this code and download its source code from Cmty link.
- Flickr Crawler & Hometown Predictor: a two-layer crawler for collecting frienship graph of people and attributes of their uploaded photos from Flickr website. The main goal of this project was to predict Flickr users' hometowns by exploiting the geotag information of their uploaded photos. You can download the source codes and find more about the crawler from Flickr link.
- Reliable Datagram Protocol: a multi-threaded reliable transport layer implemented in C. This is an application layer which runs on top of UDP layer in order to make UDP reliable as TCP. You can read more about this project and download its source code from RDP link.
- Language Detection: a Java applet for recognizing language of an input sentence by using Naive Bayes classifier. Enter a sentence and find out its natural language. You can read more about this project and download its source code from Language Recognition link.
- Soma-Cube Puzzle Solver: a Java code for solving the 7-pieces Soma Cube puzzle by using a recursive backtracking search. You can read more about the puzzle and download the puzzle solver's source code from Soma Cube link.
- Autonomous Flying Blimp: an embedded system developed for controlling an autonomous blimp. We developed both the hardware and software to control a flying blimp. This project was done by me and two other colleagues in 2008 for "Software for Embedded and Mechatronics Systems" course. You can find the design and source codes for the flying blimp at Flying Blimp You can also watch a demo of flying blimp here: Flying Blimp.
Software Research Projects
- Information Spreading/Advertising in Online Social Networks: an efficient and scalable program implemented in C for analyzing running times of rumor spreading algorithms in online social networks. Click Spread to find more about this project.
- Social Networks Connectivity: a C code for analyzing the detail connectivity of online social networks such as Facebook. Click Connectivity to find more about this project.
- Social-Sim Simulator: a comprehensive simulator written in C++ for studying the underlying properties of mobile social networks as well as evaluation of our proposed Social-Greedy routing algorithm. You can find more technical details about this project and download its source code from Social-Sim link.
- Human Contact Predictor: a Python code for inferring people movements and contact patterns in real scenarios such as conference or campus environments by exploiting statistical properties of contact graphs. Visit Prediction for more information.
- Diffusion of Virus in Social Networks: an efficient C code for simulation of how a virus/disease diffuses in social networks. You can find more about this code at Diffusion.
- Distributed Computing (Parallel SIQS): a parallel and optimized software program written in C using Message Passing Interface library for cracking large RSA keys. This project was part of my master thesis. In this project, I also built & configured a "Linux Cluster" of 17 nodes to crack RSA keys. You can find more about my thesis and its code at PSIQS. You can also download my master thesis presentation from master slides.
- K. Jahanbakhsh and Y. Moon, The Predictive Power of Social Media: On the Predictability of U.S. Presidential Elections using Twitter, submitted to a data mining conference.
- K. Jahanbakhsh, V. King, G.C. Shoja, Predicting Missing Contacts in Mobile Social Networks, Pervasive and Mobile Computing Journal (PMC), 2012.
- K. Jahanbakhsh, V. King, G.C. Shoja, Predicting Human Contacts in Mobile Social Networks using Supervised Learning, Simplex 2012 (in conjunction with www 2012), Lyon, France.
- K. Jahanbakhsh, V. King, G.C. Shoja, Empirical Comparison of Information Spreading Algorithms in the Presence of 1-Whiskers, Social Computing 2011, MIT, Boston, USA (Read More).
- K. Jahanbakhsh, V. King, G.C. Shoja, Predicting Missing Contacts in Mobile Social Networks, World of Wireless Mobile and Multimedia Networks (WoWMoM) 2011, Lucca, Italy. [Slides]
- K. Jahanbakhsh, V. King, G.C. Shoja, They Know Where You Live, posted on arxiv website, 2010.
- K. Jahanbakhsh, G.C. Shoja, V. King, Human Contact Prediction Using Contact Graph Inference, 2010 International Symposiumm on Social Computing and Networking (SocialNet-2010), Hangzhou, China. [Slides]
- K. Jahanbakhsh, G.C. Shoja, V. King, Social-Greedy: A Socially-Based Greedy Routing Algorithm for Delay Tolerant Networks, ACM/SIGMOBILE MobiOpp, Feburary 2010, Pisa, Italy.
- Y.O. Yazir, K. Jahanbakhsh, S. Ganti, G.C. Shoja, Y. Coady, A low-cost realistic testbed for mobile ad hoc networks, PACRIM, 2009, Victoria, Canada.
- M. Ghelichi, K. Jahanbakhsh, E. Sanaei, RCCT: Robust Clustering with Cooperative Transmission for Energy Efficient Wireless Sensor Networks, 7th International Conference on Information Technology : New Generations, 2008.
- K. Jahanbakhsh, M. Hajhosseini, Improving Performance of Cluster Based Routing Protocol using Cross-Layer Design, 2008. [You can find more details about this paper here.]
- K. Jahanbakhsh, J. Papadopoulos, An efficient Parallel Implementation of Self Initialization Quadratic Sieve for Integer Factorizations Using Message Passing Interface (MPI), Proceedings of 14th Iranian Conference on Electrical Engineering, Tehran (IRAN), May 2006.
- N. Jahangiri, K. Jahanbakhsh, M. Yaghubi, B. V. Vahdat, Device Drivers Skelton in Windows 98, Proceedings of 12th Iranian Conference on Electrical Engineering, Mashhad (IRAN), May 2004.
TA for Randomized Algorithms (CSC 423 : Spring 2012)
TA for Algorithms and Data Structures I (CSC 225 : Spring 2011)
TA for Introduction to Operating Systems (CSC 360 : Fall 2008, Summer and Fall 2010)
Lab Instructor for Computer Communication and Networks (CSC 361: 2008 - 2010, 2011)
TA for Operations Research: Simulation (CSC 546: Fall 2008)
PC member of Social Computing 2013, Washington, D.C., 2013.
Reviewer of SODA 2013, New Orleans, Louisiana USA, 2013.
PC member of Social Computing 2012, Amesterdam, The Netherlands, 2012
Reviewer of SocialCom 2011, MIT, Boston, USA, 2011
Reviewer of Pervasive and Mobile Computing Journal, 2011
I'm interested in playing chess and solving puzzles. Recently I have been introduced to an exciting variant of chess called Hostage by John Leslie. You can take a look at this game @
Link to My GitHub