
SDE II
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
I am a Machine Learning Engineer at Percolata. My areas of interest are machine learning, data science and deep learning. Programming Languages and Software:Python, Matlab, R Libraries and Toolkits : sciPy, numPy, scikit-learn, pandas, pytorch, tensorflow, CVX. ★ Github: https://github.com/SidharthGulati ★ Website: https://sites.google.com/a/g.ucla.edu/sidharthgulati/ ★ Contact Info: sidharthgulati@g.ucla.edu
UCLA
Master’s Degree, Electrical Engineering
January 1, 2015 – January 1, 2017
University of California, Berkeley
Summer School, Statistics, Electrical Enagineering
January 1, 2012 – January 1, 2012
Netaji Subhas Institute of Technology
Bachelor’s Degree, Electronics and Communications Engineering
January 1, 2009 – January 1, 2013
Amazon Web Services (AWS)
SDE II
January 1, 2022 – Present
San Francisco Bay Area
Qeexo
Senior Machine Learning Engineer
February 1, 2021 – December 1, 2021
Qeexo
Machine Learning Engineer
February 1, 2018 – February 1, 2021
Percolata
Machine Learning Engineer
June 1, 2017 – February 1, 2018
San Francisco Bay Area
Mahindra Comviva
Software Engineer
June 1, 2013 – May 1, 2015
Gurgaon, India
Defence Research and Development Organisation
Research Assistant
June 1, 2011 – July 1, 2011
Defence Research and Development Organisation, Hyderabad
Neural Image Captioning
September 1, 2016 – December 1, 2016
• Summarized images using a cascade of Convolutional Neural Network (CNN) (encoding) and Recurrent Neural Network (RNN) (decoding). • Embedded the images using a pre-trained inception-V3 CNN model and captioned the corresponding images using RNN with image embeddings as initial state of the RNN. • Compared the BLEU score of different RNN models namely, LSTM and GRU on AWS p2 GPU machine.
Big Data Analysis with Apache Spark
July 1, 2016 – August 1, 2016
• Millionsong Regression Pipeline: Developed an end-to-end linear regression pipeline to predict the release year of a song given a set of audio features. Implemented a gradient descent solver for linear regression, used Spark's machine learning library (MLlib) to train additional models, tuned models via grid search and improved accuracy using quadratic features. • Click-through Rate Prediction Pipeline: Constructed a logistic regression pipeline to predict click-through rate using data from a recent Kaggle competition. Extracted numerical features from the raw categorical data using one-hot-encoding, reduced the dimensionality of these features via hashing, train logistic regression models using MLlib, tuned hyperparameter via grid search, and interpreted probabilistic predictions via a ROC plot. • Neuroimaging Analysis via PCA: Identified patterns of brain activity in larval zebrafish. Worked with time-varying images (generated using a technique called light-sheet microscopy) that capture a zebrafish's neural activity as it is presented with a moving visual pattern. After implementing distributed PCA from scratch and gaining intuition by working with synthetic data, used PCA to identify distinct patterns across the zebrafish brain that are induced by different types of stimuli.
Yelp Restaurant Photo Classification (Deep Learning)
June 1, 2016 – September 1, 2016
• Tagged restaurants with multiple labels based on business photographs uploaded by users as a part of Kaggle Competition . • Implemented a pre-trained inception-V3 model to train the final layer of neural network in tensorflow using the concept of transfer learning. • Obtained a mean F1 score of 0.7047,Precision of 0.7203 and Recall of 0.6897 on AWS m4.2xlarge (26 ECUs, 8 vCPUs, 2.4 GHz, Intel Xeon E5-2676v3, 32 GiB memory, EBS only) EC2 machine. Toolkit : Python (pandas, sklearn, tensor flow, numPy, matplotlib)
Portfolio Optimization with Risk Measure as Value-at-Risk (Financial Optimization)
April 1, 2016 – June 1, 2016
• Developed a statistical model for optimal investing portfolio design minimizing worst case Value-at-Risk under ambiguous probability distribution of stock prices using large scale optimization methods. • The measure used for ambiguity was Kullback–Leiber distance from the actual distribution of the stock prices. Proximal Gradient , Douglas-Rachford, ADMM and Nestrov’s Methods (FISTA) were implemented for optimizing the objective function. Toolkit : MATLAB, CVX
Airbnb New User Bookings (Machine Learning)
January 1, 2016 – Present
• Implemented a statistical model for predicting 5 highest probable destination countries for Airbnb users using Boosting algorithm and an accuracy score of 86.4992%. • Dataset was provided by Airbnb and features such as age,gender,signup method, affiliate information etc. were used to predict the probable destinations. Extreme Gradient Boosting (XGBoost) trees were used as classifiers in this project. Toolkit : Python (numPy, pandas, XgBoost, sklearn)
Perceptual Dissimilarity and Intra- Speaker Indication (Speech Processing)
January 1, 2016 – March 1, 2016
• Developed a statistical model to analyze the perceptual dissimilarity of different speakers and measured the intra-speaker indication. • Features like F0,F1,F2,F3,F4,HNR,CPP,H1-H2, MFCCcoefficients and LPCC were used to estimate the speech utterances and a classification error rate of 3.33% was obtained using AdaBoost Trees. Toolkit: MATLAB
Person of Interest (PoI) and Email Author Identification (Machine Learning)
November 1, 2015 – Present
Identified Enron Employees who may have committed fraud based on the public Enron financial and email dataset with a precision of 0.3. Also, identified email authors (using dataset of over 70,000 emails for 7 poi’s) with an accuracy score of 97% . Toolkit : Python (nltk, sklearn)
Real-time Image processing for determining traffic density and computing the duration of the traffic light
May 1, 2012 – Present
The project uses the technique of contour counting followed by filtering the results on the basis of their sizes to achieve a method for counting number of vehicles at the traffic junction. Next, in this project a new technique of pixel counting is used to estimate the traffic density at a junction which uses the ratio of white pixels to total number of pixels as a parameter for measuring traffic. First, the sequence of images is acquired from the traffic light camera and the edges are detected using the most efficient edge detection technique. Then the resultant images are used to compute the traffic density at the junction by the above mentioned methods. By processing the resultant image we determined the green traffic light time. Finally, all these applications were consolidated into a single graphical user interface.
Text Independent Speaker Recognition using Gaussian Mixture Model
July 1, 2011 – October 1, 2011
I developed a robust statistical method to model the speaker’s identity based on speaker dependent spectral shapes. This model used feature extraction based on Mel Cepstral Feature Representation followed by the estimation of probability distribution using a Gaussian Mixture Model.
Cultural Fit Analysis
The candidate has a strong background in research and development, particularly in machine learning and data science, which aligns with an innovative and data-driven culture. The diverse range of personal projects demonstrates initiative and a passion for the field. However, the career progression from SDE II at AWS to a Data Analyst target role might require clarification regarding long-term career aspirations and alignment with typical Data Analyst responsibilities, which often involve more business-centric analysis rather than pure ML model development.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a strong problem-solving orientation and an ability to work on complex, multi-faceted problems. The variety of projects suggests adaptability and a proactive approach to learning new techniques. However, without specific behavioral assessment data, it's difficult to fully assess soft skills like teamwork or stress handling.