Dheeraj Singh

I am first year data science graduate student in the School of Imformatics, Computing, and Engineering at the Indiana University, Bloomington. I have keen interest in machine learning and its application in computer vision, natural language processing, and data science. My current course curriculum includes Elements of Artificial Intelligence, Applied Algorithm, Introduction to Statistics.

Prior to my graduate studies, I was working as a Senior Project Associate at the Indian Institute of Technology (IIT) - Kanpur where I developed a Data Visualization Web Application under the advice of Prof. Arnab Bhattacharya and a Vehicle Recognition System under the supervision of Prof. Gaurav Pandey. I have previously worked with Ipsos Research, a leading global market research firm at Bangalore office and two technology startups: Tinyowl (now, merged with Runnr) with the business intelligence team and at Embibe with the data team. I have completed my undergraduate studies from the Indian Institute of Technology (IIT) - Kharagpur in 2013.

Email  |  CV  |  LinkedIn  |  Github  |  Kaggle

  • Machine Learning: Classification, Regression, Clustering, Recommendation System, Text Minig, Bayesian Inference, Natural Language Processing
  • Methodologies: Linear/Logistic Regression, Decision Trees, Random Forest, SVM, Naive Bayes, K-Means, DBSCAN, Agglomerative Hierarchical, KNN, Bag-of-Words, Collaborative Filtering, Deep Neural Networks, Convolutional Neural Networks, RNN, LSTM
  • Programming Languages: Python, R, C, MATLAB
  • Libraries and Software: NumPy, Pandas, Scikit-Learn, Matplotlib, TensorFlow, Keras, NLTK, Jupyter Notebook, ggplot, dplyr, LaTeX, Git, Vim
  • Web Development: PHP, HTML, CSS, Javascript, Bootstrap
  • Databases: MySQL, SQLite, PostgreSQL, MongoDB
  • Operating Systems: Mac OSX, Linux, Windows

Data Visualization Web Application
Advised by Prof. Arnab Bhattacharya
Indian Institute of Technology (IIT) - Kanpur, Dec'16 - May'17
PHP, MySQL, HTML, CSS, Bootstrap framework

Developed a web-based user interactive application in PHP for real-time management and visualization of data stored in MySQL database. Defined the complete database schema, configured, and deployed the same using phpMyAdmin. Implemented device responsiveness and interoperability using the Bootstrap framework. Integrated the Google chart API to visualize the variability of data parameters in terms of distribution, trend, correlation, deviation, ratio, and frequency

Vehicle Recognition System
Advised by Prof. Gaurav Pandey
Indian Institute of Technology (IIT) - Kanpur, April'16 - Nov'16
Python, OpenCV, Computer Vision

Built a Python based OCR system to identify characters of number plates employing template matching framework. Performed various image processing activities, such as Morphological transformations, Adaptive histogram equalization, contour formation etc. using OpenCV library. The algorithm iterates over different pixel values as a threshold for binarization of gray-scaled images. The segmented binarized characters are then compared with existing templates for identification. It produced an accuracy of 83% in comparison to 76% with the existing one when tested over 1000 images.

Tinyowl Technologies - Food-tech Startup
Senior Business Analyst, May'15 - Feb'16

I led several high-impact projects aimed towards growth. I collaborated with the marketing team to develop systems that drove the marketing efforts. In the process, I built a Logistic Regression Model to predict the propensity of user retention and developed a system employing k-means clustering for consumer segmentation to mimic consumption patterns in order to optimize marketing ROI. During my tenure, I also devised an algorithm aimed to rank restaurants based on their performance where I implemented scoring algorithm using Gini coefficients and centroid method to allocate weights to different contributing factors. In one of my project, I built an internal dashboard to visualize and track multiple business metrices using shiny package in R.

Embibe - Education-tech Startup
Business Analyst, Jan'15 - April'16

The stint at Embibe required me to don multiple hats, a trait very typical of early stage startups. I worked on an array of projects ranging from cohort analysis of user base, deriving useful insights from raw data for pitching deck for investors, maintenance of Management Information System (MIS), improving & updating database schema to even writing python scripts for automation, writing r scripts to meet any ad-hoc data requirements. As the organization matured, I was responsible for addressing every data related issue within it.

Ipsos - Market Research firm
Analyst, Jan'13 - Dec'14

I worked as a Market Research Analyst for IPSOS. As part of my job profile, I conducted brand tracking and Return-on-Investment evaluation per marketing tactic by creating and assessing Market Mix Models of various market scenarios. I also developed optimization strategies for effective marketing expenditure and developed predictive models to forecast accrued profits. I worked in a global team and collaborate on a day-to-day basis with my offshore colleagues based in New York and Connecticut. I have also acted on numerous occasions as key accounts manager for my firm serving numerous global clients in a variety of domains covering retail, CPG, pharmaceuticals, restaurant chains. For my outstanding perfomance, I was also awarded with Spot Performer of Q3'2014 in the analytics domain.


Deep Learning Specialization
Python, TensorFlow, Keras, Deep Neural Networks, Convolutional Neural Networks, RNN, LSTM

Working towards deep learning specialization course provided by deeplearning.ai and Coursera. Have completed first three courses. Worked on projects involving the implementation of deep neural networks from scratch, exploring and tuning multiple hyperparameters of neural networks, different optimization techniques, regularization methods; trained deep neural network on SIGNS dataset to identify hand gesture from a given image. RNN, LSTM architectures using word embeddings (Word2Vec, Glove) for sequence modeling, NLP in particular.


Image Orientation Classification
Python, Multi-Class Classification, K-Nearest Neighbors, AdaBoost, Decision Stumps, Neural Network, Backpropagation, Gradient Descent

Trained multiple multi-class classification models on 40,000 Flickr images to predict the orientation of 1,000 images. Implemented K-nearest neighbor, AdaBoost (defined multiple decision stumps for weighted predictions), and Neural Networks (implemented backpropagation along with mini-batch Gradient Descent approach to update weights), resulting in an overall prediction accuracy of 69% for K-nearest Neighbor, and 71% for Adaboost & Neural Networks.


Optimal Path Search
Python, Graph Search

Worked on a project aimed at finding the most optimal route between a given pair of cities of the United States. Compared different Graph Search Algorithms, namely, Breadth First Search, Depth First Search, Uniform Cost Search, and A-star on the basis of path cost, time, & space requirements for multiple cost functions


Part-of-Speech (POS) Tagging
Python, Hidden Markov Model, Bayes Net, Naive Bayes, Bag-of-words

Developed a model to perform part-of-speech tagging in English language using Hidden Markov Model, Bayesian inferences, and naive Bayes. Trained the model to calculate initial, transition, emission, and state probabilities on a data consisting of nearly 1 million words and 50,000 sentences. Implemented and compared the performance of Variable Elimination (Forward-Backward Algorithm) and Viterbi Algorithm. Final model resulted in above 50% sentence accuracy and above 90% word accuracy when tested over 2000 sentences.


Tweet Classification
Python, Text-Mining, Naive Bayes, Bag-of-words

Developed a Naive Bayes classifier to identify the location from where the tweet was written by maximizing the likelihood in order to compare posteriors of all cities. Implemented Multinomial Document Model using bag-of-words and Laplace Smoothening for missing tokens


N-Queens / N-Rooks Solver
Python, DFS, BFS

Developed a N-Queens and N-Rooks solver incorporating Breadth First Search and Depth First Search algorithms


Movie Recommedation System
Python, Collaborative Filtering

Built a Movie Recommendation System in Python based on Collaborative Filtering algorithm that uses Euclidean Distance or Pearson Coefficient to find similar users and returns the list of top recommended movies for a given user. It takes method to be used to find similar users and a number of movies to be recommended as a command line argument. Used MovieLens movies rating dataset from GroupLens.


Kaggle Competitions
Python, R, Decision Trees, Random Forest, SVM, Naive Bayes, ANN, XgBoost, K-fold Cross Validation, Feature Engineering

Trained and tested numerous prediction and classification models using supervised and unsupervised learning, namely, Decision Trees, Random Forest, SVM, Xgboost. Performed K-fold Cross Validation to avoid over-fitting. Performed data manipulation techniques like missing value imputation, outliers removal, feature engineering


Sentiment Analysis and Word Cloud
R, tm package, Natural Language Processing

This project was aimed at classifying tweets into positive and negative sentiments. I employed bag-of-words and rule-based approach facilitated by opinion lexicons available here to apportion scores to words in a tweet. Tweets were cleaned using bag-of-words. The cumulative scores were driven by occurrences of predefined (set of 6800) positive and negative English words. This is the compiled list of (Hu and Liu, KDD-2004) starting from their first paper.The final score thus obtained was then used to tag a tweet as positive (score>0), neutral (score=0), and negative (score<0).


Speaker Recognition System
R, TuneR Package, K-Means Clustering, MFCC features

Built a speaker recognition system to identify time points of the change in a speaker for a given conversation (used YouTube videos) using K-means clustering. Included MFCC and delta coefficients as features vector. Performed pre-processing of Youtube vidoes and then of audio signals using tm package in R


LinkedIn Job Finder
Python, BeautifulSoup Package

Built a LinkedIn Scraper that takes job title e.g. 'Business Analyst', 'Machine Learning', or any combination of words and the number of pages to crawl as a command line argument from user and returns the list of all jobs along with job portal, job title, location, and company name matching those keywords. Used BeautifulSoup module in Python. This tool can be helpful for tedious job searches.

Credits: Jon Barron