onsite
Sr. Lead AI Engineer Inference Optimization, FM Hosting, AI Platform - Capital One
AI Engineer
Lead the design and deployment of high‑performance AI inference pipelines on AWS, optimizing models for speed and cost while ensuring robust FM hosting and platform integration for real‑time banking applications.
About the role
Key Responsibilities
- Architect and scale end‑to‑end AI inference solutions on AWS, focusing on latency, throughput, and cost efficiency.
- Lead model optimization initiatives, including quantization, pruning, and hardware‑aware compilation for production workloads.
- Collaborate with data scientists to translate research prototypes into production‑ready services, ensuring reproducibility and maintainability.
- Design and maintain a unified AI platform that supports model versioning, monitoring, and automated rollback.
- Drive best practices for FM hosting, including secure data handling, compliance, and auditability.
Requirements
- 10+ years of experience in AI/ML engineering with a strong focus on inference optimization.
- Proficiency in Python, TensorFlow/PyTorch, and AWS services (SageMaker, ECS, Lambda, EKS).
- Deep understanding of model compression techniques and hardware acceleration (GPU, FPGA, ASIC).
- Experience building and maintaining AI platforms and model governance pipelines.
- Excellent communication skills and a track record of leading cross‑functional teams.
Skills
pythonmachine learningawstensorflow