
Lead ML Engineer (Associate Director) | Ex-Microsoft | Ex-Siemens Princeton
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
I am a Lead AI/ML Engineer at Optum with 15 years of experience turning complex data challenges into impactful AI solutions. My work spans traditional ML, deep learning, NLP, and cutting-edge LLMs like GPT-4 Omni. I am passionate about not just building intelligent systems, but also shaping the teams and vision that bring them to life.
Iowa State University
PhD, Computer Engineering
August 1, 2006 – April 1, 2012
South Point High School
Higher Secondary
N/A – Present
Jadavpur University
Bachelor of Engineering, Computer Science & Engineering
N/A – Present
Optum
Lead AI/ML Engineer (Associate Director)
August 1, 2024 – Present
Noida, Uttar Pradesh, India · On-site
Microsoft
Technical Leader, Windows Data and Applied Science
November 1, 2020 – July 1, 2024
Noida, Uttar Pradesh, India
Accenture
Principal Researcher, Artificial Intelligence
November 1, 2017 – November 1, 2020
Noida, Uttar Pradesh, India
Impetus
Head of Data Science (India)
July 1, 2016 – October 1, 2017
Impetus
Senior Data Scientist
July 1, 2015 – October 1, 2017
Impetus
Data Scientist
August 1, 2013 – June 1, 2015
Case Commons, Inc.
Data Scientist
November 1, 2011 – August 1, 2013
New York, NY
Siemens Corporate Research
Research Intern
April 1, 2010 – February 1, 2011
Princeton, NJ
Iowa State University
Graduate Research Assistant
August 1, 2006 – April 1, 2010
Ames, IA
Automatic Text Extraction and Component Identification from P&IDs
December 1, 2016 – January 1, 2017
Worked with one of the largest multinational energy corporations. Developed algorithms to automatically identify text labels and components from millions of Piping & Instrumentation Diagrams, using Convolutional Neural Networks, Depth-first Search and custom heuristics.
Analysis of Fatal Defects in Car Body Paints
October 1, 2016 – Present
Working with one of the largest automobile manufacturers. The goal is to investigate how the physical conditions in a manufacturing plant influence the number of fatal paint defects in cars. Currently developing a prototype application, using a novel ensemble of conditional inference trees and custom heuristics, to demonstrate the strongest predictors of defect and the critical values of those predictors. Also developing a novel application with Random Forest and genetic algorithms to recommend the optimal physical conditions for minimizing the defects per vehicle.
Generating Insights on Customer Retention
August 1, 2016 – September 1, 2016
The project was with one of the largest wireless communications service providers in the US. The goal was to automatically identify the factors leading to customers not renewing or disconnecting after contract expiry. Designed and implemented a novel randomized decision-tree-based algorithm for generating such insights and pruning to keep the most relevant ones only. Rigorously derived analytical guarantees and implemented in R for two million customers and more than 100 features.
Online Activity Detection on Twitter
January 1, 2016 – February 1, 2016
The goal was to detect, based on Twitter raw packet captures, what online activities a user has been involved with in a session, e.g., Tweet with Image, Text-Only Tweet, Other, etc; and also to precisely identify the start and end times of those activities. Developed custom algorithms to define sessions and flows on raw packet data and to engineer features. Used a deep learning autoencoder (in R + H2O) with 10 hidden layers for noise removal and Naive Bayes with class-conditional densities obtained from kernel density estimates. Tested on two independent test datasets: on the first, could detect Tweet + Image with 95.53% recall and 99.99% precision, Tweet Text Only with 91.69% recall and 35% precision and Other with 51.35% recall and 94.27% precision. On the second test dataset, obtained 89.68% recall and 100% precision for Tweet + Image, and 78.93% recall and 90.27% precision for Tweet Text Only.
Identifying Peers for Corporate Credit Card Clients
May 1, 2015 – December 1, 2015
This was done for one of the largest financial institutions in the world. Clients with corporate credit cards use those cards for purchasing supplies and managing travel and other expenses. The goal was to identify peers of clients based on their revenues, geographies, industries, patterns of purchases from suppliers, pattern of delinquency etc to effectively recommend suppliers to the clients, thus driving the card-members’ engagement. Designed novel distance metrics for heterogeneous data and experimented with multiple clustering algorithms. A naive algorithm for all-pair distance computation for 78,000 clients would have taken more than 3 billion computations and hence 11 days (projected). Restricted search for nearest neighbors for a client within its own cluster to obtain a speedup of more than 500 times.
Detecting Fraudulent Intent from Employee’s Emails
September 1, 2014 – December 1, 2014
This was done for one of the largest banks in the USA which faced heavy financial penalties in last few years because of internal fraud. Worked to select an NLP-based product, from among a few, which can detect fraud (money laundering, anti-trust, tying etc) from emails exchanged among the employees. Used semi-supervised/active learning algorithms as the training set was very small (< 100 emails) and even a 5% sample of the test set involved 30 million emails per month.
Detecting Fraudulent claims from Medicare data
March 1, 2014 – June 1, 2014
Given demographic data for 100 million patients, 300 million procedures they underwent, and a labeled subset of 50,000 patients, identified additional 10,000 patients as most fraudulent, using logistic regression model that gave a sensitivity of 96.4%, and a specificity of 92.95%. This was part of the Cloudera data science challenge that I won.
Identifying Bot-infected Sessions from Network Traces
December 1, 2013 – February 1, 2014
Implemented a predictive classification system for separating bot-infected sessions from benign sessions using a stacking ensemble of three classification algorithms: SVM with RBF kernel, decision tree and neural network. Identified the features based on exploratory data analysis that showed the difference in behavior between bot and use traffic. Extracted n-grams from sequences of URLs visited, and selected ~50 features out of more than 5,500 features by computing information gain. Achieved a false negative rate on test set to 10.3%, while keeping the false positive rate (on test set) 6.8% and the overall error rate (on test set) 7.8%.
Predicting Healthcare Expenditure Increase for an Individual from Medicare Data
September 1, 2013 – November 1, 2013
Worked with anonymized but publicly available Medicare data, which has more than 114,000 beneficiaries and more than 12,400 features. Addressed the problem of accurately predicting which beneficiaries' inpatient claim amounts increased between 2008 and 2009, using an ensemble of six different classification algorithms: Gradient Boosting Machine, Conditional Inference Tree, Neural Networks, SVM, Logistic Regression and Naive Bayes. Showed that kidney conditions, COPD, hypertension, stroke/transient ischemic attack, cancer and osteoporosis are among the most influential conditions behind expenditure increase. Paper accepted in Health-Informatics KDD (HI-KDD) 2014.
Generative AI with Large Language Models
DeepLearning.AI, Amazon Web Services
June 24, 2026 – Present
Machine Learning
Coursera
June 24, 2026 – Present
Neural Networks and Deep Learning
Coursera
June 24, 2026 – Present
Cultural Fit Analysis
The candidate's diverse project portfolio, ranging from academic research to industry applications across multiple sectors, indicates a strong adaptability and willingness to tackle varied challenges. Their experience in leading teams and contributing to internal conferences (MLADS, MSJAR) suggests a collaborative and knowledge-sharing mindset. The breadth of skills and continuous learning (certifications in Generative AI, Machine Learning, Deep Learning) align well with a culture of innovation and continuous improvement. The candidate's long tenure in technical roles and progression to leadership positions also suggests commitment and growth potential.
Soft Skills & Operational Fit
The candidate's extensive experience in leading teams, managing projects, and presenting research (publications, conferences) suggests strong communication and collaboration skills. Their ability to translate complex technical solutions into tangible business outcomes (cost savings, accuracy improvements) indicates a strong operational fit and business acumen. The diversity of projects across various industries (automotive, finance, healthcare, retail, telecommunications) demonstrates adaptability and a broad problem-solving approach.