remote
Principal LLM Engineer Productionization - a5labs
LLM Engineer
Lead the end‑to‑end productionization of large language models, designing scalable MLOps pipelines, deploying on cloud infrastructure, and ensuring robust monitoring and performance tuning for enterprise AI solutions.
About the role
Key Responsibilities
- Architect and implement production‑ready pipelines for training, fine‑tuning, and serving large language models at scale.
- Design and maintain CI/CD workflows, containerization (Docker) and orchestration (Kubernetes) for model deployment.
- Collaborate with data scientists to translate research prototypes into reliable, high‑throughput services.
- Implement monitoring, logging, and alerting to ensure model performance, drift detection, and compliance.
- Optimize resource utilization and cost across cloud platforms (AWS, GCP, or Azure).
Requirements
- 10+ years of software engineering experience with a focus on AI/ML systems.
- Deep expertise in large language models, transformer architectures, and related frameworks (e.g., Hugging Face, TensorFlow, PyTorch).
- Proven track record in MLOps, containerization, and cloud deployment at scale.
- Strong programming skills in Python and familiarity with infrastructure as code.
- Excellent problem‑solving, communication, and leadership abilities.
Skills
mlopspythondockerkubernetes