onsite

Sr. Lead AI Engineer Inference Optimization, FM Hosting, AI Platform - Capital One

AI Engineer

Lead the design and deployment of high‑performance AI inference pipelines on AWS, optimizing models for speed and cost while ensuring robust FM hosting and platform integration for real‑time banking applications.

About the role

Key Responsibilities

Architect and scale end‑to‑end AI inference solutions on AWS, focusing on latency, throughput, and cost efficiency.
Lead model optimization initiatives, including quantization, pruning, and hardware‑aware compilation for production workloads.
Collaborate with data scientists to translate research prototypes into production‑ready services, ensuring reproducibility and maintainability.
Design and maintain a unified AI platform that supports model versioning, monitoring, and automated rollback.
Drive best practices for FM hosting, including secure data handling, compliance, and auditability.

Requirements

10+ years of experience in AI/ML engineering with a strong focus on inference optimization.
Proficiency in Python, TensorFlow/PyTorch, and AWS services (SageMaker, ECS, Lambda, EKS).
Deep understanding of model compression techniques and hardware acceleration (GPU, FPGA, ASIC).
Experience building and maintaining AI platforms and model governance pipelines.
Excellent communication skills and a track record of leading cross‑functional teams.

Skills

pythonmachine learningawstensorflow

CompanyCapital One

DepartmentEngineering

LocationPrince, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 22, 2026