Sabari Balan

AI Engineer

https://www.opentalent.in/sabari-balan

Key Strengths

Extensive hands-on experience in designing and implementing advanced RAG pipelines and agentic systems for real-world applications, particularly in healthcare.
Proficient in LLM orchestration, multimodal AI systems (TTS, STT, Text, Embeddings), and vector database architecture (Milvus, pgvector).
Strong background in building scalable AI infrastructure, including model routing, PII guardrails, prompt management, and canary deployments.
Demonstrated ability to deliver end-to-end solutions, from architecture to deployment and observability (Prometheus, Grafana).
Experience with a diverse set of AI frameworks and tools (LangChain, LangGraph, OpenAI API, Groq, Hugging Face, AWS Bedrock, Azure AI).

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's project portfolio showcases a strong alignment with the target role of an AI Engineer, particularly in advanced RAG, agentic systems, and LLM orchestration. The mix of professional and self-initiated projects demonstrates initiative, continuous learning, and a passion for AI. Their experience in healthcare AI indicates an ability to apply technical skills to complex, high-impact domains. The breadth of technologies and frameworks used suggests adaptability and a willingness to explore different solutions, which is a good indicator of cultural fit within a dynamic engineering environment.

Soft Skills & Operational Fit

The candidate demonstrates strong problem-solving skills through their project descriptions, tackling complex issues like clinical hallucination reduction, cost optimization, and real-time data integration. Their experience with canary-style rollouts and observability indicates a focus on robust, production-ready systems and operational excellence. The self-initiated 'Enterprise Repo-RAG' project highlights a proactive learning attitude and deep technical curiosity, which are valuable for cultural fit in an innovative AI team.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

AI Engineer with production experience building healthcare AI platforms at Zauto AI. Designed and shipped a multi-channel AI Healthcare Agent (Voice, WhatsApp, Web) automating appointment booking and real-time patient record retrieval using custom RAG pipelines and LLM function calling across 20+ hospitals. Led on-site deployments integrating real-time doctor-patient conversation AI into live EMR systems. Independently architected an enterprise-grade codebase intelligence system using advanced retrieval techniques including Multi-Stage Retrieval, BGE-Reranker v2, AST-based chunking, and semantic caching – achieving 95%+ retrieval precision while reducing token costs by 30%. Hands-on with MCP (Model Context Protocol) for tool-augmented agentic workflows, LangChain/LangGraph orchestration, and production AI infrastructure built with Python, FastAPI, and NestJS/Node.js. Strong understanding of LLM orchestration, multimodal AI systems (TTS, STT, Embeddings), vector database architecture, and scalable AI application design.

Top Skills

Multimodal AittstextGroqNode.Js

Projects

Alhena - Healthcare AI Automation Platform

June 24, 2026 – Present

Engineered end-to-end Voice-EMR automation retrieving lab reports, prescriptions, and visit history via custom RAG pipelines on Milvus — handling 500+ patient queries daily across Telephony and WhatsApp with sub-second retrieval. Implemented Tiktoken-based dynamic chunking with metadata filters for patient-record queries — achieving ~40% reduction in clinical hallucination rate vs fixed-size chunking baseline. Integrated real-time EMR data access via LLM Function Calling — enabling fully automated appointment booking and dynamic service updates, eliminating manual staff intervention for routine queries.

LLM Gateway - Centralised AI Model Management Platform

June 24, 2026 – Present

Architected a centralised LLM gateway with unified multi-provider routing across Text, TTS, STT, and Embedding modalities — supporting OpenAI, Azure, ElevenLabs, Sarvam, Deepgram, and others with real-time streaming. Implemented capability-aware model routing with NER-based guardrails, per-org API key isolation, and token & rate limiting - cutting average cost-per-query by ~20% through intelligent provider arbitrage.

AI Prompt Manager - Agent-Oriented Prompt Orchestration Platform

June 24, 2026 – Present

Engineered an agent-based prompt orchestration system supporting multi-LLM routing, semantic version control, and runtime configuration hot-swaps — managing 50+ active prompt templates across 4 AI agents in production. Implemented a canary-style rollout mechanism for gradual traffic shifting between prompt versions — reducing mean rollback time from hours to under 5 minutes and eliminating prompt regression incidents in production.

Meta WhatsApp Integration Platform - In-House Communication Infrastructure

June 24, 2026 – Present

Single-handedly delivered an in-house Meta WhatsApp Business platform replacing third-party SaaS providers — with full messaging, template, and account management capability. Built event-driven webhook infrastructure processing 2,000+ events/min at sub-200ms delivery latency — eliminating recurring external vendor costs.

Enterprise Repo-RAG — Agentic Codebase Intelligence System

June 24, 2026 – Present

Architected an Agentic RAG system using LangChain AgentExecutor with Tavily Web Search and custom code-tools for multi-source intelligence across KT Docs, Source Code, and Live Documentation — built independently to deepen advanced RAG and agentic workflow knowledge. Built a Multi-Stage Retrieval pipeline (Parent Document Retrieval + Multi-Vector) refined by BGE-Reranker v2 — achieving 95%+ retrieval precision on codebase queries. Implemented AST-based language-aware chunking, Session Namespacing for multi-tenant retrieval, and pgvector-backed Semantic Caching — reducing token costs by 30%. Deployed open-source model inference (Llama 3.1, Qwen) via Hugging Face and Groq achieving sub-300ms latency; built observability stack with Prometheus and Grafana monitoring P99 latency and RAGAS Faithfulness/Relevancy scores in real time.

Certifications

Foundation: Introduction to LangChain - Python

LangChain Academy

May 1, 2026 – Present

Quickstart: LangChain Essentials - Python

LangChain Academy

May 1, 2026 – Present

Foundation: Introduction to LangGraph - Python

LangChain Academy

May 1, 2026 – Present

Key Strengths

Extensive hands-on experience in designing and implementing advanced RAG pipelines and agentic systems for real-world applications, particularly in healthcare.
Proficient in LLM orchestration, multimodal AI systems (TTS, STT, Text, Embeddings), and vector database architecture (Milvus, pgvector).
Strong background in building scalable AI infrastructure, including model routing, PII guardrails, prompt management, and canary deployments.
Demonstrated ability to deliver end-to-end solutions, from architecture to deployment and observability (Prometheus, Grafana).
Experience with a diverse set of AI frameworks and tools (LangChain, LangGraph, OpenAI API, Groq, Hugging Face, AWS Bedrock, Azure AI).

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Sabari Balan

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit