onsite

Senior Lead AI Engineer Foundation Model Hosting & LLM Inference - Capital One

AI Engineer

Lead the design, deployment, and scaling of foundation model hosting and LLM inference pipelines, driving reliable, production‑grade AI solutions using Python, PyTorch, Kubernetes, and AWS.

About the role

Key Responsibilities

Architect and implement high‑performance, low‑latency hosting platforms for foundation models and large language model inference.
Lead a cross‑functional team to build end‑to‑end MLOps pipelines, including model versioning, monitoring, and automated scaling.
Collaborate with data scientists to optimize model architectures, quantization, and serving strategies for production workloads.
Design and maintain cloud‑native infrastructure on AWS, leveraging Kubernetes, EKS, and serverless services for cost‑effective scalability.
Establish best practices for model governance, security, and responsible AI, ensuring compliance with regulatory standards.

Requirements

7+ years of software engineering experience with a focus on AI/ML systems, including deep expertise in Python and PyTorch.
Proven track record designing and operating large‑scale model serving platforms on Kubernetes and AWS.
Strong background in MLOps, CI/CD, containerization, and monitoring of AI workloads.
Experience with large language models, model optimization techniques (quantization, pruning), and inference performance tuning.
Excellent leadership, communication, and mentorship skills to guide senior engineers and data scientists.

Skills

pythonpytorchkubernetesawsmlops

CompanyCapital One

DepartmentResearch

LocationPimmit, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 22, 2026