onsite
Lead AI Engineer Foundation Model Hosting & LLM Inference - Capital One
AI Engineer
Lead AI Engineer responsible for designing, deploying, and scaling foundation model hosting and LLM inference pipelines using Python, PyTorch/TensorFlow, Kubernetes, Docker, and AWS while driving best‑in‑class MLOps practices.
About the role
Key Responsibilities
- Architect and implement high‑performance, low‑latency inference services for large language models and foundation models.
- Design, containerize, and orchestrate model serving workloads on Kubernetes clusters in cloud environments.
- Develop end‑to‑end MLOps pipelines for model versioning, monitoring, and automated scaling.
- Collaborate with data scientists and product teams to integrate AI capabilities into customer‑facing applications.
- Ensure reliability, security, and compliance of AI systems in a regulated financial services context.
Requirements
- 5+ years of hands‑on experience building and deploying AI/ML systems at scale.
- Strong proficiency in Python and deep‑learning frameworks such as PyTorch or TensorFlow.
- Extensive experience with containerization (Docker) and orchestration (Kubernetes) on AWS or equivalent cloud platforms.
- Proven track record implementing MLOps practices, including CI/CD, model monitoring, and automated rollouts.
- Solid understanding of distributed systems, performance optimization, and security best practices for AI workloads.
Skills
pythonpytorchtensorflowkubernetesdockerawsmlops