onsite
Senior Lead AI Engineer Foundation Model Hosting & LLM Inference - Capital One
AI Engineer
Lead the design, deployment, and scaling of foundation model hosting and LLM inference pipelines, driving reliable, production‑grade AI solutions using Python, PyTorch, Kubernetes, and AWS.
About the role
Key Responsibilities
- Architect and implement high‑performance, low‑latency hosting platforms for foundation models and large language model inference.
- Lead a cross‑functional team to build end‑to‑end MLOps pipelines, including model versioning, monitoring, and automated scaling.
- Collaborate with data scientists to optimize model architectures, quantization, and serving strategies for production workloads.
- Design and maintain cloud‑native infrastructure on AWS, leveraging Kubernetes, EKS, and serverless services for cost‑effective scalability.
- Establish best practices for model governance, security, and responsible AI, ensuring compliance with regulatory standards.
Requirements
- 7+ years of software engineering experience with a focus on AI/ML systems, including deep expertise in Python and PyTorch.
- Proven track record designing and operating large‑scale model serving platforms on Kubernetes and AWS.
- Strong background in MLOps, CI/CD, containerization, and monitoring of AI workloads.
- Experience with large language models, model optimization techniques (quantization, pruning), and inference performance tuning.
- Excellent leadership, communication, and mentorship skills to guide senior engineers and data scientists.
Skills
pythonpytorchkubernetesawsmlops