Sidharth Gulati

Data Analyst

https://www.opentalent.in/sidharth-gulati

SDE II

Key Strengths

Extensive experience in Machine Learning, Deep Learning, and statistical modeling, highly relevant for a Data Analyst role.
Proficiency in Python (pandas, sklearn, TensorFlow, NumPy, XGBoost, NLTK) and MATLAB, essential tools for data analysis.
Experience with Big Data technologies like Apache Spark for large-scale data processing and analysis.
Strong academic background with a Master's in Electrical Engineering from UCLA and a Bachelor's in Electronics and Communications Engineering.
Demonstrated ability to apply advanced analytical techniques to diverse datasets (image, text, financial, audio).

Cultural & Operational Fit

Cultural Fit Analysis

The candidate has a strong background in research and development, particularly in machine learning and data science, which aligns with an innovative and data-driven culture. The diverse range of personal projects demonstrates initiative and a passion for the field. However, the career progression from SDE II at AWS to a Data Analyst target role might require clarification regarding long-term career aspirations and alignment with typical Data Analyst responsibilities, which often involve more business-centric analysis rather than pure ML model development.

Soft Skills & Operational Fit

The candidate's project descriptions indicate a strong problem-solving orientation and an ability to work on complex, multi-faceted problems. The variety of projects suggests adaptability and a proactive approach to learning new techniques. However, without specific behavioral assessment data, it's difficult to fully assess soft skills like teamwork or stress handling.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

I am a Machine Learning Engineer at Percolata. My areas of interest are machine learning, data science and deep learning. Programming Languages and Software:Python, Matlab, R Libraries and Toolkits : sciPy, numPy, scikit-learn, pandas, pytorch, tensorflow, CVX. ★ Github: https://github.com/SidharthGulati ★ Website: https://sites.google.com/a/g.ucla.edu/sidharthgulati/ ★ Contact Info: sidharthgulati@g.ucla.edu

Top Skills

PythonMatlabSQLMachine LearningBig Data AnalyticsDeep LearningConvex OptimizationApache SparkHadoophiveGame TheoryImage ProcessingSignal ProcessingJavaC++LinuxRData AnalysisGitMavenMicrosoft OfficeAlgorithmsNeural NetworksClassification

Education

UCLA

Master’s Degree, Electrical Engineering

January 1, 2015 – January 1, 2017

University of California, Berkeley

Summer School, Statistics, Electrical Enagineering

January 1, 2012 – January 1, 2012

Netaji Subhas Institute of Technology

Bachelor’s Degree, Electronics and Communications Engineering

January 1, 2009 – January 1, 2013

Experience

Amazon Web Services (AWS)

SDE II

January 1, 2022 – Present

San Francisco Bay Area

Qeexo

Senior Machine Learning Engineer

February 1, 2021 – December 1, 2021

Qeexo

Machine Learning Engineer

February 1, 2018 – February 1, 2021

Percolata

Machine Learning Engineer

June 1, 2017 – February 1, 2018

San Francisco Bay Area

Mahindra Comviva

Software Engineer

June 1, 2013 – May 1, 2015

Gurgaon, India

Defence Research and Development Organisation

Research Assistant

June 1, 2011 – July 1, 2011

Defence Research and Development Organisation, Hyderabad

Projects

Neural Image Captioning

September 1, 2016 – December 1, 2016

• Summarized images using a cascade of Convolutional Neural Network (CNN) (encoding) and Recurrent Neural Network (RNN) (decoding). • Embedded the images using a pre-trained inception-V3 CNN model and captioned the corresponding images using RNN with image embeddings as initial state of the RNN. • Compared the BLEU score of different RNN models namely, LSTM and GRU on AWS p2 GPU machine.

Big Data Analysis with Apache Spark

July 1, 2016 – August 1, 2016

• Millionsong Regression Pipeline: Developed an end-to-end linear regression pipeline to predict the release year of a song given a set of audio features. Implemented a gradient descent solver for linear regression, used Spark's machine learning library (MLlib) to train additional models, tuned models via grid search and improved accuracy using quadratic features. • Click-through Rate Prediction Pipeline: Constructed a logistic regression pipeline to predict click-through rate using data from a recent Kaggle competition. Extracted numerical features from the raw categorical data using one-hot-encoding, reduced the dimensionality of these features via hashing, train logistic regression models using MLlib, tuned hyperparameter via grid search, and interpreted probabilistic predictions via a ROC plot. • Neuroimaging Analysis via PCA: Identified patterns of brain activity in larval zebrafish. Worked with time-varying images (generated using a technique called light-sheet microscopy) that capture a zebrafish's neural activity as it is presented with a moving visual pattern. After implementing distributed PCA from scratch and gaining intuition by working with synthetic data, used PCA to identify distinct patterns across the zebrafish brain that are induced by different types of stimuli.

Yelp Restaurant Photo Classification (Deep Learning)

June 1, 2016 – September 1, 2016

• Tagged restaurants with multiple labels based on business photographs uploaded by users as a part of Kaggle Competition . • Implemented a pre-trained inception-V3 model to train the final layer of neural network in tensorflow using the concept of transfer learning. • Obtained a mean F1 score of 0.7047,Precision of 0.7203 and Recall of 0.6897 on AWS m4.2xlarge (26 ECUs, 8 vCPUs, 2.4 GHz, Intel Xeon E5-2676v3, 32 GiB memory, EBS only) EC2 machine. Toolkit : Python (pandas, sklearn, tensor flow, numPy, matplotlib)

Portfolio Optimization with Risk Measure as Value-at-Risk (Financial Optimization)

April 1, 2016 – June 1, 2016

• Developed a statistical model for optimal investing portfolio design minimizing worst case Value-at-Risk under ambiguous probability distribution of stock prices using large scale optimization methods. • The measure used for ambiguity was Kullback–Leiber distance from the actual distribution of the stock prices. Proximal Gradient , Douglas-Rachford, ADMM and Nestrov’s Methods (FISTA) were implemented for optimizing the objective function. Toolkit : MATLAB, CVX

Airbnb New User Bookings (Machine Learning)

January 1, 2016 – Present

• Implemented a statistical model for predicting 5 highest probable destination countries for Airbnb users using Boosting algorithm and an accuracy score of 86.4992%. • Dataset was provided by Airbnb and features such as age,gender,signup method, affiliate information etc. were used to predict the probable destinations. Extreme Gradient Boosting (XGBoost) trees were used as classifiers in this project. Toolkit : Python (numPy, pandas, XgBoost, sklearn)

Perceptual Dissimilarity and Intra- Speaker Indication (Speech Processing)

January 1, 2016 – March 1, 2016

• Developed a statistical model to analyze the perceptual dissimilarity of different speakers and measured the intra-speaker indication. • Features like F0,F1,F2,F3,F4,HNR,CPP,H1-H2, MFCCcoefficients and LPCC were used to estimate the speech utterances and a classification error rate of 3.33% was obtained using AdaBoost Trees. Toolkit: MATLAB

Person of Interest (PoI) and Email Author Identification (Machine Learning)

November 1, 2015 – Present

Identified Enron Employees who may have committed fraud based on the public Enron financial and email dataset with a precision of 0.3. Also, identified email authors (using dataset of over 70,000 emails for 7 poi’s) with an accuracy score of 97% . Toolkit : Python (nltk, sklearn)

Real-time Image processing for determining traffic density and computing the duration of the traffic light

May 1, 2012 – Present

The project uses the technique of contour counting followed by filtering the results on the basis of their sizes to achieve a method for counting number of vehicles at the traffic junction. Next, in this project a new technique of pixel counting is used to estimate the traffic density at a junction which uses the ratio of white pixels to total number of pixels as a parameter for measuring traffic. First, the sequence of images is acquired from the traffic light camera and the edges are detected using the most efficient edge detection technique. Then the resultant images are used to compute the traffic density at the junction by the above mentioned methods. By processing the resultant image we determined the green traffic light time. Finally, all these applications were consolidated into a single graphical user interface.

Text Independent Speaker Recognition using Gaussian Mixture Model

July 1, 2011 – October 1, 2011

I developed a robust statistical method to model the speaker’s identity based on speaker dependent spectral shapes. This model used feature extraction based on Mel Cepstral Feature Representation followed by the estimation of probability distribution using a Gaussian Mixture Model.

Key Strengths

Extensive experience in Machine Learning, Deep Learning, and statistical modeling, highly relevant for a Data Analyst role.
Proficiency in Python (pandas, sklearn, TensorFlow, NumPy, XGBoost, NLTK) and MATLAB, essential tools for data analysis.
Experience with Big Data technologies like Apache Spark for large-scale data processing and analysis.
Strong academic background with a Master's in Electrical Engineering from UCLA and a Bachelor's in Electronics and Communications Engineering.
Demonstrated ability to apply advanced analytical techniques to diverse datasets (image, text, financial, audio).

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit