onsite
ML Compute Efficiency Automation Engineer, Infrastructure & Planning - Apple
QA Engineer
Lead automation of ML compute efficiency, designing scalable infrastructure and tooling to accelerate model training and inference across Apple’s platform using Python, AWS, Kubernetes and advanced optimization techniques.
About the role
Key Responsibilities
- Design and implement automated pipelines that optimize ML model training and inference workloads on shared infrastructure.
- Collaborate with ML teams to profile models, identify bottlenecks, and apply compute‑efficiency techniques such as mixed‑precision, model pruning, and dynamic batching.
- Build and maintain scalable, cloud‑native infrastructure using AWS services, Kubernetes, and IaC tools to support high‑throughput ML workloads.
- Develop monitoring and alerting solutions to track resource utilization, latency, and cost metrics across the ML ecosystem.
- Drive continuous improvement initiatives, documenting best practices and contributing to platform tooling that reduces operational friction for software engineers.
Requirements
- 5+ years of experience in ML infrastructure or DevOps roles, with a strong focus on compute efficiency.
- Proficiency in Python, AWS, Kubernetes, and infrastructure automation (Terraform, Helm).
- Hands‑on experience with ML profiling, model optimization, and performance tuning.
- Excellent problem‑solving skills and ability to work cross‑functionally with ML, platform, and software engineering teams.
- Strong communication skills and a track record of delivering production‑ready tooling.
Skills
pythonmachine learningawskubernetes