remote
Senior LLM Release Engineer, Private Cloud Compute - Apple
Research Engineer
Senior engineer responsible for designing, testing, and scaling the deployment pipeline of large language models on a private cloud platform, ensuring reliable delivery to hundreds of millions of iOS devices.
About the role
Key Responsibilities
- Architect and implement end‑to‑end release pipelines for LLMs across a private cloud compute environment.
- Design automated testing, validation, and monitoring frameworks to guarantee model performance and reliability at global scale.
- Collaborate with infrastructure, security, and product teams to define standards for distributed deployment, scaling, and rollback strategies.
- Optimize resource utilization and cost efficiency using container orchestration, infrastructure‑as‑code, and CI/CD best practices.
- Drive incident response, root‑cause analysis, and continuous improvement of release processes.
Requirements
- 5+ years of experience in release engineering or site reliability engineering for large‑scale distributed systems.
- Strong proficiency in Python and scripting for automation.
- Hands‑on expertise with Kubernetes, Docker, and Terraform for managing private cloud resources.
- Deep understanding of CI/CD pipelines, automated testing, and monitoring in high‑throughput environments.
- Proven ability to troubleshoot complex, multi‑region deployments and deliver reliable services to a massive user base.
Skills
pythonkubernetesdockercicdterraform