AI Engineer with less than a year in LangGraph & LLMs
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Highly skilled AI Engineering Intern specializing in Agentic AI systems, LLM inference optimization, and Semantic RAG pipelines. Proficient in LangGraph, Triton, vLLM, PyTorch, and ONNX Runtime. Experienced in developing custom tools, improving model accuracy, and optimizing memory and throughput for various transformer architectures. Passionate about building robust and efficient AI solutions, as evidenced by successful hackathon wins and impactful project contributions.
Indian Institute of Information Technology and Management, Gwalior
Integrated M.Tech · Information Technology
July 1, 2022 – July 1, 2027
synapse Health Systems
AI Engineering Intern
June 1, 2025 – August 1, 2025
India
slm-turbo
May 1, 2026 – June 1, 2026
Built an automated LLM inference optimizer that profiles GPU hardware via roofline analysis in < 2 s without loading weights, classifies prefill/decode bottlenecks, and generates version-controlled optimization recipes for 20+ HuggingFace transformer architectures (1B-70B parameters). Implemented a custom Triton kernel for fused asymmetric KV cache dequantization (4-bit K, 2-bit V) with in-register reconstruction reducing decode-phase memory traffic by 6x and KV cache footprint by 75%, with a perplexity safety gate that auto-disables if quality drops > 5s%. Designed a recipe engine orchestrating 4 optimizers (KV quant, radix prefix caching, chunked prefill, attention backend selection) with capability-gated auto-selection per (model, GPU) pair - achieving 2.1x throughput and 45% lower p99 latency for TinyLlama-1.1B on GTX 1650 vs stock vLLM. Integrated with vLLM via a non-forking adapter injecting at AttentionBackend and CacheConfig boundaries, with graceful fallback to PyTorch SDPA on Triton compilation failure - zero crashes across 50+ serve cycles on sm_75 hardware.
View Projectzerch
March 1, 2026 – April 1, 2026
Engineered a Semantic RAG pipeline for logs, replacing keyword matching with dense vector retrieval to improve anomaly detection accuracy by 90%. Built a high-throughput Embedding Pipeline (ONNX Runtime, MiniLM-L6-v2) converting text to 384-dim vectors, ingesting 1,000+ logs/sec into Qdrant. Developed a Model Context Protocol (MCP) server exposing 'search_logs' as agentic tools, enabling LLM agents to perform autonomous function calling via JSON-RPC. Integrated Groq API for LLM-powered context generation, feeding retrieved embeddings to the model to reduce incident debugging time by 70%.
View ProjectWon HackByte Hackathon
IIITDM Jabalpur
June 1, 2026 – Present
Ranked in the Top 3 teams at TechMinds Hackathon
IITM
June 1, 2026 – Present
Cultural Fit Analysis
The candidate's involvement in multiple hackathons and diverse projects (LLM optimization, RAG, agentic AI) demonstrates a proactive, innovative, and collaborative spirit. The focus on performance optimization and robust system design aligns well with a high-performance engineering culture. The breadth of skills and technologies used suggests a strong desire for continuous learning and adaptability, which are key for cultural fit in dynamic AI roles.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a strong problem-solving aptitude, evidenced by optimizing LLM inference and improving RAG accuracy. The detailed technical explanations suggest good communication of complex ideas. Participation in hackathons implies teamwork and ability to perform under pressure. The remote internship experience also points to self-management and adaptability.