HIMANSHU WAGH

AI Engineer

https://www.opentalent.in/himanshu-wagh

AI Engineer with 3+ years in Generative AI & Machine Learning

Realfy Inc

Key Strengths

Deep expertise in real-time AI system architecture and low-latency inference optimization (PyTorch, ONNX, FP16, TensorRT).
Strong practical experience with LLM orchestration, agent design (LangChain/LangGraph), and RAG.
Proficient in building scalable, high-throughput backend systems using FastAPI, asyncio, Docker, and Kubernetes.
Demonstrated ability to deploy and manage GPU-backed inference services with CI/CD and autoscaling.
Experience with multimodal pipelines and streaming APIs for conversational AI.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate demonstrates a strong cultural fit for an AI Engineer role, particularly in a fast-paced, innovation-driven environment. Their projects showcase a proactive approach to tackling complex, real-world problems (e.g., real-time voice AI, low-latency inference). The breadth of technologies used (PyTorch, FastAPI, Kubernetes, LangChain, ONNX, TensorRT, PostgreSQL, Redis, AWS, GCP) indicates adaptability and a continuous learning mindset. The publication also highlights a research-oriented and problem-solving attitude. The focus on performance and scalability aligns well with typical startup or high-growth tech company cultures.

Soft Skills & Operational Fit

The candidate's project and experience descriptions indicate a strong focus on performance optimization, system architecture, and end-to-end solution delivery, which are critical for operational fit in an AI Engineer role. The detailed metrics provided suggest a results-oriented approach. While direct soft skill assessment is not possible from the provided data, the structured project descriptions imply good communication of technical achievements.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Experienced AI Engineer with 3.4 years of expertise in architecting real-time AI platforms and developing low-latency inference pipelines. Proficient in Generative AI, Machine Learning, and DevOps technologies, specializing in multimodal AI systems, NLP pipelines, and data-driven solutions. Proven ability to optimize model performance, reduce latency, and ensure high availability in production environments.

Top Skills

Scikit LearnFastapi

Education

Michigan Technological University

Masters of Science · Data Science

August 1, 2023 – April 1, 2025

Savitribai Phule Pune University

Bachelor of Engineering

January 1, 2017 – January 1, 2021

Experience

Realfy Inc

AI Engineer

October 1, 2025 – May 1, 2026

India

Michigan Technological University

Graduate Research Assistant

December 1, 2023 – October 1, 2025

India

Fyle Technologies

Software Engineering Intern

October 1, 2021 – April 1, 2022

India

Projects

Real-Time Voice AI Agent Platform

June 1, 2026 – Present

Architected a real-time, full-duplex speech-to-speech conversational AI system, enabling low-latency voice interactions across dynamic user workflows. Designed streaming pipeline: ASR → LLM reasoning → TTS, supporting interruptible conversations and context-aware responses. Implemented LLM-powered agent orchestration using LangChain/LangGraph, enabling tool-calling, memory retention, and multi-step reasoning. Optimized inference latency using PyTorch + ONNX + FP16 batching, reducing response time to <600ms for end-to-end voice responses.

View Project

Low-Latency AI Inference & Backend System

June 1, 2026 – Present

Implemented GPU-optimized inference pipelines using PyTorch and TensorRT, reducing latency by 35%. Developed async REST APIs using FastAPI and asyncio, increasing request handling capacity by 70%. Integrated PostgreSQL and Redis caching layers, decreasing response time by 40%. Automated CI/CD pipelines with Docker and Kubernetes, improving deployment frequency by 3x.

Key Strengths

Deep expertise in real-time AI system architecture and low-latency inference optimization (PyTorch, ONNX, FP16, TensorRT).
Strong practical experience with LLM orchestration, agent design (LangChain/LangGraph), and RAG.
Proficient in building scalable, high-throughput backend systems using FastAPI, asyncio, Docker, and Kubernetes.
Demonstrated ability to deploy and manage GPU-backed inference services with CI/CD and autoscaling.
Experience with multimodal pipelines and streaming APIs for conversational AI.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

HIMANSHU WAGH

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit