onsite

Lead AI Engineer Foundation Model Hosting & LLM Inference - Capital One

AI Engineer

Lead AI Engineer responsible for designing, deploying, and scaling foundation model hosting and LLM inference pipelines using Python, PyTorch/TensorFlow, Kubernetes, Docker, and AWS while driving best‑in‑class MLOps practices.

About the role

Key Responsibilities

Architect and implement high‑performance, low‑latency inference services for large language models and foundation models.
Design, containerize, and orchestrate model serving workloads on Kubernetes clusters in cloud environments.
Develop end‑to‑end MLOps pipelines for model versioning, monitoring, and automated scaling.
Collaborate with data scientists and product teams to integrate AI capabilities into customer‑facing applications.
Ensure reliability, security, and compliance of AI systems in a regulated financial services context.

Requirements

5+ years of hands‑on experience building and deploying AI/ML systems at scale.
Strong proficiency in Python and deep‑learning frameworks such as PyTorch or TensorFlow.
Extensive experience with containerization (Docker) and orchestration (Kubernetes) on AWS or equivalent cloud platforms.
Proven track record implementing MLOps practices, including CI/CD, model monitoring, and automated rollouts.
Solid understanding of distributed systems, performance optimization, and security best practices for AI workloads.

Skills

pythonpytorchtensorflowkubernetesdockerawsmlops

CompanyCapital One

DepartmentResearch

LocationPimmit, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 22, 2026