Tanish Kandivlikar

ML Engineer

Key Strengths

Deep expertise in LLM serving optimization, including vLLM, AWQ, GPTQ, and Marlin kernels, demonstrating a strong understanding of performance bottlenecks and solutions.
Proficiency in distributed LLM training with DDP, FSDP, and Ray Train, showcasing practical experience in scaling model training and optimizing resource utilization.
Experience in building production-grade multi-agent systems using LangGraph, FastAPI, and PostgreSQL, indicating strong system design and implementation skills for AI applications.
Hands-on experience with MLOps platforms on AWS EKS, Terraform, and Helm, highlighting capabilities in infrastructure as code, CI/CD, and automated model deployment.
Demonstrated ability to implement low-level GPU kernels (FlashAttention in Triton), indicating a strong grasp of performance-critical ML operations and hardware interaction.
Extensive project experience in Generative AI, RAG systems, and LLM orchestration, directly aligning with the target ML Engineer role.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's project portfolio is highly diverse, covering areas from low-level GPU optimization to high-level multi-agent systems and MLOps. This breadth of technical interest and capability suggests adaptability and a strong learning orientation, which are positive indicators for cultural fit in a dynamic ML engineering environment. The projects are all personal, which demonstrates strong initiative and self-driven learning. The target role of ML Engineer is well-aligned with the candidate's demonstrated skills and project focus, particularly in Generative AI and MLOps. The candidate's educational background in Data Science further supports this alignment.

Soft Skills & Operational Fit

The candidate's project descriptions indicate a strong problem-solving aptitude, particularly in identifying and resolving complex technical challenges like OOM errors in distributed training and optimizing inference throughput. The focus on production-grade systems, CI/CD, and evaluation pipelines suggests an operational mindset and attention to reliability and quality. The detailed performance metrics and comparative analyses in projects demonstrate an analytical and results-oriented approach.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

GenAI Engineer focused on LLM systems, inference optimization, and production AI infrastructure. I’m most interested in the part of ML that starts after the model works: serving real traffic, scaling GPUs efficiently, handling drift, deploying reliably, and building systems that stay stable under production load. At Findability Sciences, I built Stretto AI, a production RAG platform for bankruptcy legal research using AWS Bedrock, LangGraph orchestration and hybrid retrieval pipelines. That experience pushed me deeper into distributed training, inference benchmarking, MLOps, and evaluation systems. Since then, I’ve built: • LLM serving benchmarks on real RTX 4090 hardware • Distributed training systems with FSDP/DDP/Ray Train • Self-healing MLOps infrastructure on AWS EKS • Multi-agent systems with evaluation and CI/CD guardrails I also write about LLM infrastructure, vLLM internals, model monitoring, and recommender systems on Medium.

Top Skills

MongoDBFastapiInfrastructureGPUMicrosoft AzureGitopsJenkinsLanggraphNlp LibrariesInfrastructure as Code (IaC)Data EngineeringAgent Developmentmulti-agent-systemsapache-spark-mlCudaFine TuningMachine Learning AlgorithmsData AnalyticsReinforcement LearningData MiningDistributed Trainingamazon-ec2ContainerizationNatural Language Processing (NLP)TensorFlowArtificial Intelligence (AI)Apache KafkaAi AgentsComputer VisionAnsibleTerraformMicroservicesDevOpsKubernetesPyTorchAmazon S3BashGenerative AiLlmsFeature EngineeringRedisMlopsSnowflakeAWS SageMakerDockerMlflowStatistical AnalysisExploratory Data AnalysisWeb ScrapingStatisticsApache SparkHypothesis TestingGitDatabasesEnglishJupyterGithubMicrosoft ExcelMicrosoft WordData AnalysisData WarehousingData VisualizationData scienceSQLTableauMicrosoft Power BIMachine LearningDeep Learning

Skills

Experience

Synechron

Generative AI Engineer

June 1, 2026 – Present

North Carolina, United States · Hybrid

Community Dreams Foundation

Data Scientist

September 1, 2025 – May 1, 2026

Findability Sciences

Generative AI

August 1, 2024 – December 1, 2024

United States · Remote

Purplle.com

Marketing Analyst Intern

December 1, 2021 – March 1, 2022

Mumbai, Maharashtra, India · Remote

Projects

1) FlashAttention Forward Pass - Triton GPU Kernel

June 1, 2026 – Present

● Implemented the FlashAttention forward pass in Triton from scratch. (non-causal) ● Standard attention allocates an [N, N] score matrix in HBM. At N=4096 that's the memory bottleneck FlashAttention eliminates by tiling Q/K/V in SRAM and using online softmax never materializing the full matrix. ● Results on RTX 4090 (causal): - N=512: 1.25x faster than PyTorch standard attention - N=1024: 1.32x faster - N=2048: 2.19x faster - N=4096: 6.34x faster ● What's implemented: - Tiled attention with online softmax (running max + running sum per row) - Causal mask (upper triangular -inf) - PyTorch integration via torch.autograd.Function + nn.Module - Works with torch.compile

4) Multi-Agent Customer Support System - LangGraph, MCP, FastAPI, PostgreSQL

May 1, 2026 – May 1, 2026

● Built a production-grade multi-agent system on LangGraph: supervisor routes to 3 specialist agents (billing, technical, general), each connected to its own MCP server container; billing agent does agentic RAG against ChromaDB before web search; PostgreSQL checkpointer persists all conversation state across restarts with token-by-token SSE streaming. ● LangSmith eval pipeline achieves 100% routing accuracy (20/20); GitHub Actions CI/CD runs supervisor eval on every push and blocks merges if accuracy drops below 100%; input/output guardrails, human-in-the-loop approval for billing responses, and auto-retry from last checkpoint on failure.

1) Production MLOps Platform - AWS EKS, Terraform, Helm

May 1, 2026 – May 1, 2026

● Self-healing MLOps platform on AWS EKS with MLflow quality-gated model promotion, automated drift detection, and hot-reload inference; Terraform provisions the full AWS stack (VPC, EKS, ECR, S3, IAM/OIDC) in a single apply; GPU nodes autoscale 0 to 1 on training demand. ● GitHub Actions CI/CD (test, ECR push, rolling redeploy), PostgreSQL prediction logging with automatic retraining on drift, and live Prometheus + Grafana dashboards; deployed on production AWS infrastructure.

3) LLM Serving Optimization - vLLM, AWQ, GPTQ, Marlin on RTX 4090

May 1, 2026 – May 1, 2026

● Benchmarked 5 inference backends (naive HuggingFace, vLLM, AWQ, GPTQ, GPTQ Marlin) on a real RTX 4090; vLLM continuous batching + PagedAttention delivered an 80x throughput gain over naive serving (32 to 2,596 tok/s at concurrency=100). ● Identical GPTQ weights via the Marlin kernel hit 3,023 tok/s vs 1,718 tok/s with the default kernel - 1.76x throughput from kernel engineering alone, with zero algorithm change.

2) Distributed LLM Training - Llama-3-8B, DDP, FSDP, Ray Train

May 1, 2026 – May 1, 2026

● Benchmarked DDP, FSDP, and Ray Train for fine-tuning Llama-3-8B on real Vast.ai hardware; FSDP cut per-rank memory 75% (77GB to 15.8GB), enabling batch=16 - 2-GPU FSDP at $2.10/hr outperformed 4-GPU DDP at $4.20/hr in both throughput and cost. ● Reproduced and solved 8 OOM failure modes from first principles (gradient bucket pre-allocation, FSDP sharding bug, FP16 NaN from SiLU overflow); tracked all experiments with MLflow.

Certifications

NVIDIA-Certified Associate: AI Infrastructure and Operations

NVIDIA

June 25, 2026 – Present

Machine Learning A-Z™: Hands-On Python & R In Data Science

Udemy

June 25, 2026 – Present

Large Language Models (LLMs), Transformers & GPT A-Z

SuperDataScience

June 25, 2026 – Present

Google Data Analytics

Google

June 25, 2026 – Present

Long-Term Agentic Memory With LangGraph!

DeepLearning.AI

June 25, 2026 – Present

Evaluating AI Agents

DeepLearning.AI

June 25, 2026 – Present

Preparatory Course

International Institute of Information Technology Bangalore

June 25, 2026 – Present

IBM Data Science

IBM

June 25, 2026 – Present

Agentforce Champion

Salesforce

June 25, 2026 – Present

AI Agents Fundamentals

Hugging Face

June 25, 2026 – Present

Post-Training of LLMs

DeepLearning.AI

June 25, 2026 – Present

Certificate of completion - A/B Testing in Python

365 Data Science

June 25, 2026 – Present

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Udemy

June 25, 2026 – Present

Data Toolkit

International Institute of Information Technology Bangalore

June 25, 2026 – Present

Key Strengths

Deep expertise in LLM serving optimization, including vLLM, AWQ, GPTQ, and Marlin kernels, demonstrating a strong understanding of performance bottlenecks and solutions.
Proficiency in distributed LLM training with DDP, FSDP, and Ray Train, showcasing practical experience in scaling model training and optimizing resource utilization.
Experience in building production-grade multi-agent systems using LangGraph, FastAPI, and PostgreSQL, indicating strong system design and implementation skills for AI applications.
Hands-on experience with MLOps platforms on AWS EKS, Terraform, and Helm, highlighting capabilities in infrastructure as code, CI/CD, and automated model deployment.
Demonstrated ability to implement low-level GPU kernels (FlashAttention in Triton), indicating a strong grasp of performance-critical ML operations and hardware interaction.
Extensive project experience in Generative AI, RAG systems, and LLM orchestration, directly aligning with the target ML Engineer role.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Tanish Kandivlikar

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit