Sourav Roy

AI Engineer

https://www.opentalent.in/sourav-roy-3997346

AI Engineer with 2+ years in LLM Systems & GenAI APIs

YUGA AI

Key Strengths

Extensive experience in designing and deploying production-grade RAG-based LLM systems and multi-agent workflows.
Proven ability to engineer low-latency GenAI APIs (FastAPI) handling high request volumes (10K+ daily requests).
Strong background in LLM fine-tuning with QLoRA, achieving significant VRAM reduction and training efficiency improvements.
Proficient in the full LLM stack, from retrieval and ranking to generation and deployment.
Demonstrated expertise in prompt engineering, context management, and hallucination reduction strategies.
Experience with NLP automation pipelines, lead enrichment, classification, and entity extraction.
Solid understanding of backend development, APIs (FastAPI, REST), WebSockets, and microservices.
Hands-on experience with infrastructure components like AWS, Docker, CI/CD, and PostgreSQL.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's academic projects demonstrate a strong interest and initiative in cutting-edge AI research and development, particularly in GenAI and LLMs. Their experience as a Freelance AI Developer shows adaptability and the ability to deliver solutions across the full AI lifecycle. The current role at YUGA AI, focusing on production RAG systems, aligns well with a fast-paced, innovative AI engineering culture. The breadth of skills and diverse project types (medical QA, travel planning, academic review, fine-tuning platforms) indicate a versatile and curious individual who can contribute to various aspects of an AI team.

Soft Skills & Operational Fit

The candidate's project descriptions and experience highlight strong problem-solving skills, particularly in optimizing performance (latency reduction, VRAM reduction) and building end-to-end AI solutions independently. Their work on multi-agent systems and collaborative prompt engineering indicates an ability to work effectively in complex, team-oriented environments. The focus on real-time systems and production deployment suggests a practical, results-oriented approach.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

AI/ML Engineer with 3+ years of experience building production-grade LLM systems, RAG pipelines, and agentic AI workflows. Proven track record shipping low-latency GenAI APIs (FastAPI) serving 10K+ daily requests, fine-tuning large language models with QLoRA, and engineering NLP automation pipelines at scale. Hands-on with the full LLM stack from retrieval and ranking to generation and deployment. Seeking AI Engineer / ML Engineer / GenAI Developer roles.

Top Skills

RagLangchainPrompt EngineeringHugging Face TransformersMicroservices

Experience

YUGA AI

AI & ML Engineer

December 1, 2025 – Present

India

New Gen Leads USA

Freelance AI Developer

January 1, 2023 – February 1, 2025

United States

Projects

CURA Retrieval-Augmented Medical QA System

June 25, 2026 – Present

Designed a RAG-based medical question-answering system with grounded, source-cited responses to reduce hallucination in high-stakes clinical queries.

ALADDINGO GenAI Travel Planning System

June 25, 2026 – Present

Developed a generative AI travel planning assistant as a capstone thesis project, integrating LLM reasoning with structured itinerary generation.

Auto-Researcher Multi-Agent Academic Review System

November 1, 2025 – December 1, 2025

Built a graph-based 3-agent LLM pipeline (LangChain + LangGraph) to automate academic literature review across 500+ papers, reducing manual review time significantly. Integrated academic search, PDF parsing, and citation-grounded summarisation, 1,000+ documents end-to-end with traceable research outputs. Implemented real-time agent streaming with a live research dashboard, reducing workflow latency by 40%.

AutoLLM Forge LLM Fine-Tuning Platform

September 1, 2025 – October 1, 2025

Built a QLoRA fine-tuning system for 2Bto70B parameter models, achieving 75% VRAM reduction and 30% training efficiency improvement using PyTorch and FastAPI. Developed a FastAPI + Next.js platform supporting 1,000+ HuggingFace models with automated inference pipelines and real-time training monitoring via WebSockets.

Key Strengths

Extensive experience in designing and deploying production-grade RAG-based LLM systems and multi-agent workflows.
Proven ability to engineer low-latency GenAI APIs (FastAPI) handling high request volumes (10K+ daily requests).
Strong background in LLM fine-tuning with QLoRA, achieving significant VRAM reduction and training efficiency improvements.
Proficient in the full LLM stack, from retrieval and ranking to generation and deployment.
Demonstrated expertise in prompt engineering, context management, and hallucination reduction strategies.
Experience with NLP automation pipelines, lead enrichment, classification, and entity extraction.
Solid understanding of backend development, APIs (FastAPI, REST), WebSockets, and microservices.
Hands-on experience with infrastructure components like AWS, Docker, CI/CD, and PostgreSQL.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Sourav Roy

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit