onsite
Staff Machine Learning Engineer, Compute - General Motors
ML Engineer
Lead the design and scaling of a cloud‑agnostic compute platform that powers autonomous vehicle AI and other high‑performance ML workloads, driving reliability, cost efficiency, and rapid innovation across the organization.
About the role
Key Responsibilities
- Architect, build, and maintain a scalable, cloud‑agnostic compute backend that supports large‑scale ML training and inference for autonomous vehicle and other AI workloads.
- Collaborate with data scientists, ML engineers, and infrastructure teams to optimize GPU utilization, reduce training times, and lower operational costs.
- Implement robust CI/CD pipelines, monitoring, and observability for ML workloads, ensuring high availability and rapid deployment cycles.
- Drive research and adoption of cutting‑edge distributed training frameworks (e.g., Horovod, DeepSpeed) and container orchestration (Kubernetes, Docker).
- Mentor junior engineers, conduct code reviews, and champion best practices in software engineering and ML operations.
Requirements
- 10+ years of software engineering experience with a strong focus on ML infrastructure.
- Deep expertise in Python, distributed systems, and GPU‑accelerated deep learning frameworks (TensorFlow, PyTorch).
- Proven track record building production‑grade, cloud‑agnostic compute platforms on AWS, GCP, or Azure.
- Strong knowledge of containerization, Kubernetes, and CI/CD tooling.
- Excellent communication skills and a collaborative mindset.
Skills
pythonmachine learningdeep learning