Bibudh Lahiri

ML Engineer

https://www.opentalent.in/bibudh-lahiri

Lead ML Engineer (Associate Director) | Ex-Microsoft | Ex-Siemens Princeton

Key Strengths

Extensive experience (20 years) in Machine Learning and AI, directly aligning with the ML Engineer target role.
Proven track record of architecting and deploying end-to-end ML solutions, including multimodal LLMs and extreme classification, with significant business impact (e.g., saving $200k, improving adjudication accuracy by 16%).
Deep expertise in various ML algorithms (SVM, Decision Trees, Neural Networks, Random Forest, Gradient Boosting, Logistic Regression, Naive Bayes, Autoencoders, CNNs) and data science techniques (NLP, causal ML, time-series analysis, clustering).
Strong academic background with a PhD in Computer Engineering and numerous publications, demonstrating a research-oriented and problem-solving mindset.
Experience in leadership roles (Lead AI/ML Engineer, Technical Leader, Head of Data Science) indicates ability to drive projects and mentor teams.
Demonstrated ability to innovate and develop novel algorithms (e.g., randomized decision-tree-based algorithm, custom heuristics, novel distance metrics).
Experience with large-scale data processing and distributed systems (Apache Spark, Hadoop) for fraud detection and other applications.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's diverse project portfolio, ranging from academic research to industry applications across multiple sectors, indicates a strong adaptability and willingness to tackle varied challenges. Their experience in leading teams and contributing to internal conferences (MLADS, MSJAR) suggests a collaborative and knowledge-sharing mindset. The breadth of skills and continuous learning (certifications in Generative AI, Machine Learning, Deep Learning) align well with a culture of innovation and continuous improvement. The candidate's long tenure in technical roles and progression to leadership positions also suggests commitment and growth potential.

Soft Skills & Operational Fit

The candidate's extensive experience in leading teams, managing projects, and presenting research (publications, conferences) suggests strong communication and collaboration skills. Their ability to translate complex technical solutions into tangible business outcomes (cost savings, accuracy improvements) indicates a strong operational fit and business acumen. The diversity of projects across various industries (automotive, finance, healthcare, retail, telecommunications) demonstrates adaptability and a broad problem-solving approach.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

I am a Lead AI/ML Engineer at Optum with 15 years of experience turning complex data challenges into impactful AI solutions. My work spans traditional ML, deep learning, NLP, and cutting-edge LLMs like GPT-4 Omni. I am passionate about not just building intelligent systems, but also shaping the teams and vision that bring them to life.

Top Skills

PyTorchDeep LearningSolution ArchitectureMachine LearningAlgorithmsBig DataData AnalysisJavaDistributed SystemsData MiningHadoopAnalyticsSoftware DevelopmentPythonBusiness IntelligenceData WarehousingRSoftware EngineeringParallel ComputingMPIApache SparkmapreduceCluster management

Education

Iowa State University

PhD, Computer Engineering

August 1, 2006 – April 1, 2012

South Point High School

Higher Secondary

N/A – Present

Jadavpur University

Bachelor of Engineering, Computer Science & Engineering

N/A – Present

Experience

Optum

Lead AI/ML Engineer (Associate Director)

August 1, 2024 – Present

Noida, Uttar Pradesh, India · On-site

Microsoft

Technical Leader, Windows Data and Applied Science

November 1, 2020 – July 1, 2024

Noida, Uttar Pradesh, India

Accenture

Principal Researcher, Artificial Intelligence

November 1, 2017 – November 1, 2020

Noida, Uttar Pradesh, India

Impetus

Head of Data Science (India)

July 1, 2016 – October 1, 2017

Impetus

Senior Data Scientist

July 1, 2015 – October 1, 2017

Impetus

Data Scientist

August 1, 2013 – June 1, 2015

Case Commons, Inc.

Data Scientist

November 1, 2011 – August 1, 2013

New York, NY

Siemens Corporate Research

Research Intern

April 1, 2010 – February 1, 2011

Princeton, NJ

Iowa State University

Graduate Research Assistant

August 1, 2006 – April 1, 2010

Ames, IA

Projects

Automatic Text Extraction and Component Identification from P&IDs

December 1, 2016 – January 1, 2017

Worked with one of the largest multinational energy corporations. Developed algorithms to automatically identify text labels and components from millions of Piping & Instrumentation Diagrams, using Convolutional Neural Networks, Depth-first Search and custom heuristics.

Analysis of Fatal Defects in Car Body Paints

October 1, 2016 – Present

Working with one of the largest automobile manufacturers. The goal is to investigate how the physical conditions in a manufacturing plant influence the number of fatal paint defects in cars. Currently developing a prototype application, using a novel ensemble of conditional inference trees and custom heuristics, to demonstrate the strongest predictors of defect and the critical values of those predictors. Also developing a novel application with Random Forest and genetic algorithms to recommend the optimal physical conditions for minimizing the defects per vehicle.

Generating Insights on Customer Retention

August 1, 2016 – September 1, 2016

The project was with one of the largest wireless communications service providers in the US. The goal was to automatically identify the factors leading to customers not renewing or disconnecting after contract expiry. Designed and implemented a novel randomized decision-tree-based algorithm for generating such insights and pruning to keep the most relevant ones only. Rigorously derived analytical guarantees and implemented in R for two million customers and more than 100 features.

Online Activity Detection on Twitter

January 1, 2016 – February 1, 2016

The goal was to detect, based on Twitter raw packet captures, what online activities a user has been involved with in a session, e.g., Tweet with Image, Text-Only Tweet, Other, etc; and also to precisely identify the start and end times of those activities. Developed custom algorithms to define sessions and flows on raw packet data and to engineer features. Used a deep learning autoencoder (in R + H2O) with 10 hidden layers for noise removal and Naive Bayes with class-conditional densities obtained from kernel density estimates. Tested on two independent test datasets: on the first, could detect Tweet + Image with 95.53% recall and 99.99% precision, Tweet Text Only with 91.69% recall and 35% precision and Other with 51.35% recall and 94.27% precision. On the second test dataset, obtained 89.68% recall and 100% precision for Tweet + Image, and 78.93% recall and 90.27% precision for Tweet Text Only.

Identifying Peers for Corporate Credit Card Clients

May 1, 2015 – December 1, 2015

This was done for one of the largest financial institutions in the world. Clients with corporate credit cards use those cards for purchasing supplies and managing travel and other expenses. The goal was to identify peers of clients based on their revenues, geographies, industries, patterns of purchases from suppliers, pattern of delinquency etc to effectively recommend suppliers to the clients, thus driving the card-members’ engagement. Designed novel distance metrics for heterogeneous data and experimented with multiple clustering algorithms. A naive algorithm for all-pair distance computation for 78,000 clients would have taken more than 3 billion computations and hence 11 days (projected). Restricted search for nearest neighbors for a client within its own cluster to obtain a speedup of more than 500 times.

Detecting Fraudulent Intent from Employee’s Emails

September 1, 2014 – December 1, 2014

This was done for one of the largest banks in the USA which faced heavy financial penalties in last few years because of internal fraud. Worked to select an NLP-based product, from among a few, which can detect fraud (money laundering, anti-trust, tying etc) from emails exchanged among the employees. Used semi-supervised/active learning algorithms as the training set was very small (< 100 emails) and even a 5% sample of the test set involved 30 million emails per month.

Detecting Fraudulent claims from Medicare data

March 1, 2014 – June 1, 2014

Given demographic data for 100 million patients, 300 million procedures they underwent, and a labeled subset of 50,000 patients, identified additional 10,000 patients as most fraudulent, using logistic regression model that gave a sensitivity of 96.4%, and a specificity of 92.95%. This was part of the Cloudera data science challenge that I won.

Identifying Bot-infected Sessions from Network Traces

December 1, 2013 – February 1, 2014

Implemented a predictive classification system for separating bot-infected sessions from benign sessions using a stacking ensemble of three classification algorithms: SVM with RBF kernel, decision tree and neural network. Identified the features based on exploratory data analysis that showed the difference in behavior between bot and use traffic. Extracted n-grams from sequences of URLs visited, and selected ~50 features out of more than 5,500 features by computing information gain. Achieved a false negative rate on test set to 10.3%, while keeping the false positive rate (on test set) 6.8% and the overall error rate (on test set) 7.8%.

Predicting Healthcare Expenditure Increase for an Individual from Medicare Data

September 1, 2013 – November 1, 2013

Worked with anonymized but publicly available Medicare data, which has more than 114,000 beneficiaries and more than 12,400 features. Addressed the problem of accurately predicting which beneficiaries' inpatient claim amounts increased between 2008 and 2009, using an ensemble of six different classification algorithms: Gradient Boosting Machine, Conditional Inference Tree, Neural Networks, SVM, Logistic Regression and Naive Bayes. Showed that kidney conditions, COPD, hypertension, stroke/transient ischemic attack, cancer and osteoporosis are among the most influential conditions behind expenditure increase. Paper accepted in Health-Informatics KDD (HI-KDD) 2014.

Certifications

Generative AI with Large Language Models

DeepLearning.AI, Amazon Web Services

June 24, 2026 – Present

Machine Learning

Coursera

June 24, 2026 – Present

Neural Networks and Deep Learning

Coursera

June 24, 2026 – Present

Key Strengths

Extensive experience (20 years) in Machine Learning and AI, directly aligning with the ML Engineer target role.
Proven track record of architecting and deploying end-to-end ML solutions, including multimodal LLMs and extreme classification, with significant business impact (e.g., saving $200k, improving adjudication accuracy by 16%).
Deep expertise in various ML algorithms (SVM, Decision Trees, Neural Networks, Random Forest, Gradient Boosting, Logistic Regression, Naive Bayes, Autoencoders, CNNs) and data science techniques (NLP, causal ML, time-series analysis, clustering).
Strong academic background with a PhD in Computer Engineering and numerous publications, demonstrating a research-oriented and problem-solving mindset.
Experience in leadership roles (Lead AI/ML Engineer, Technical Leader, Head of Data Science) indicates ability to drive projects and mentor teams.
Demonstrated ability to innovate and develop novel algorithms (e.g., randomized decision-tree-based algorithm, custom heuristics, novel distance metrics).
Experience with large-scale data processing and distributed systems (Apache Spark, Hadoop) for fraud detection and other applications.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit