
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
AI Engineer @ Synechron
GenAI Engineer focused on LLM systems, inference optimization, and production AI infrastructure. I’m most interested in the part of ML that starts after the model works: serving real traffic, scaling GPUs efficiently, handling drift, deploying reliably, and building systems that stay stable under production load. At Findability Sciences, I built Stretto AI, a production RAG platform for bankruptcy legal research using AWS Bedrock, LangGraph orchestration and hybrid retrieval pipelines. That experience pushed me deeper into distributed training, inference benchmarking, MLOps, and evaluation systems. Since then, I’ve built: • LLM serving benchmarks on real RTX 4090 hardware • Distributed training systems with FSDP/DDP/Ray Train • Self-healing MLOps infrastructure on AWS EKS • Multi-agent systems with evaluation and CI/CD guardrails I also write about LLM infrastructure, vLLM internals, model monitoring, and recommender systems on Medium.
Worcester Polytechnic Institute
Master of Science - MS, Data Science
August 1, 2023 – May 1, 2025
International Institute of Information Technology Bangalore
Postgraduate Degree, Data Science - BI and Analytics Track
January 1, 2020 – January 1, 2021
Synechron
Generative AI Engineer
June 1, 2026 – Present
North Carolina, United States · Hybrid
Community Dreams Foundation
Data Scientist
September 1, 2025 – May 1, 2026
Findability Sciences
Generative AI
August 1, 2024 – December 1, 2024
United States · Remote
Purplle.com
Marketing Analyst Intern
December 1, 2021 – March 1, 2022
Mumbai, Maharashtra, India · Remote
1) FlashAttention Forward Pass - Triton GPU Kernel
June 1, 2026 – Present
● Implemented the FlashAttention forward pass in Triton from scratch. (non-causal) ● Standard attention allocates an [N, N] score matrix in HBM. At N=4096 that's the memory bottleneck FlashAttention eliminates by tiling Q/K/V in SRAM and using online softmax never materializing the full matrix. ● Results on RTX 4090 (causal): - N=512: 1.25x faster than PyTorch standard attention - N=1024: 1.32x faster - N=2048: 2.19x faster - N=4096: 6.34x faster ● What's implemented: - Tiled attention with online softmax (running max + running sum per row) - Causal mask (upper triangular -inf) - PyTorch integration via torch.autograd.Function + nn.Module - Works with torch.compile
4) Multi-Agent Customer Support System - LangGraph, MCP, FastAPI, PostgreSQL
May 1, 2026 – May 1, 2026
● Built a production-grade multi-agent system on LangGraph: supervisor routes to 3 specialist agents (billing, technical, general), each connected to its own MCP server container; billing agent does agentic RAG against ChromaDB before web search; PostgreSQL checkpointer persists all conversation state across restarts with token-by-token SSE streaming. ● LangSmith eval pipeline achieves 100% routing accuracy (20/20); GitHub Actions CI/CD runs supervisor eval on every push and blocks merges if accuracy drops below 100%; input/output guardrails, human-in-the-loop approval for billing responses, and auto-retry from last checkpoint on failure.
1) Production MLOps Platform - AWS EKS, Terraform, Helm
May 1, 2026 – May 1, 2026
● Self-healing MLOps platform on AWS EKS with MLflow quality-gated model promotion, automated drift detection, and hot-reload inference; Terraform provisions the full AWS stack (VPC, EKS, ECR, S3, IAM/OIDC) in a single apply; GPU nodes autoscale 0 to 1 on training demand. ● GitHub Actions CI/CD (test, ECR push, rolling redeploy), PostgreSQL prediction logging with automatic retraining on drift, and live Prometheus + Grafana dashboards; deployed on production AWS infrastructure.
3) LLM Serving Optimization - vLLM, AWQ, GPTQ, Marlin on RTX 4090
May 1, 2026 – May 1, 2026
● Benchmarked 5 inference backends (naive HuggingFace, vLLM, AWQ, GPTQ, GPTQ Marlin) on a real RTX 4090; vLLM continuous batching + PagedAttention delivered an 80x throughput gain over naive serving (32 to 2,596 tok/s at concurrency=100). ● Identical GPTQ weights via the Marlin kernel hit 3,023 tok/s vs 1,718 tok/s with the default kernel - 1.76x throughput from kernel engineering alone, with zero algorithm change.
2) Distributed LLM Training - Llama-3-8B, DDP, FSDP, Ray Train
May 1, 2026 – May 1, 2026
● Benchmarked DDP, FSDP, and Ray Train for fine-tuning Llama-3-8B on real Vast.ai hardware; FSDP cut per-rank memory 75% (77GB to 15.8GB), enabling batch=16 - 2-GPU FSDP at $2.10/hr outperformed 4-GPU DDP at $4.20/hr in both throughput and cost. ● Reproduced and solved 8 OOM failure modes from first principles (gradient bucket pre-allocation, FSDP sharding bug, FP16 NaN from SiLU overflow); tracked all experiments with MLflow.
NVIDIA-Certified Associate: AI Infrastructure and Operations
NVIDIA
June 25, 2026 – Present
Machine Learning A-Z™: Hands-On Python & R In Data Science
Udemy
June 25, 2026 – Present
Large Language Models (LLMs), Transformers & GPT A-Z
SuperDataScience
June 25, 2026 – Present
Google Data Analytics
June 25, 2026 – Present
Long-Term Agentic Memory With LangGraph!
DeepLearning.AI
June 25, 2026 – Present
Evaluating AI Agents
DeepLearning.AI
June 25, 2026 – Present
Preparatory Course
International Institute of Information Technology Bangalore
June 25, 2026 – Present
IBM Data Science
IBM
June 25, 2026 – Present
Agentforce Champion
Salesforce
June 25, 2026 – Present
AI Agents Fundamentals
Hugging Face
June 25, 2026 – Present
Post-Training of LLMs
DeepLearning.AI
June 25, 2026 – Present
Certificate of completion - A/B Testing in Python
365 Data Science
June 25, 2026 – Present
Deep Learning A-Z™: Hands-On Artificial Neural Networks
Udemy
June 25, 2026 – Present
Data Toolkit
International Institute of Information Technology Bangalore
June 25, 2026 – Present
Cultural Fit Analysis
The candidate's project portfolio is highly diverse, covering areas from low-level GPU optimization to high-level multi-agent systems and MLOps. This breadth of technical interest and capability suggests adaptability and a strong learning orientation, which are positive indicators for cultural fit in a dynamic ML engineering environment. The projects are all personal, which demonstrates strong initiative and self-driven learning. The target role of ML Engineer is well-aligned with the candidate's demonstrated skills and project focus, particularly in Generative AI and MLOps. The candidate's educational background in Data Science further supports this alignment.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a strong problem-solving aptitude, particularly in identifying and resolving complex technical challenges like OOM errors in distributed training and optimizing inference throughput. The focus on production-grade systems, CI/CD, and evaluation pipelines suggests an operational mindset and attention to reliability and quality. The detailed performance metrics and comparative analyses in projects demonstrate an analytical and results-oriented approach.