remote
Staff Machine Learning Infrastructure Engineer - General Motors (GM)
ML Engineer
Lead the design and scaling of machine‑learning platforms for autonomous driving, building robust, cloud‑native infrastructure using Python, Kubernetes, AWS, and modern ML frameworks.
About the role
Key Responsibilities
- Architect and implement scalable, cloud‑native ML infrastructure to support large‑scale autonomous‑driving model training and inference.
- Design, deploy, and manage Kubernetes clusters and CI/CD pipelines that enable rapid experimentation and production rollout.
- Collaborate with data scientists and software engineers to integrate TensorFlow, PyTorch, and other frameworks into a unified platform.
- Optimize resource utilization and cost on AWS, leveraging services such as SageMaker, EKS, and serverless compute.
- Establish best practices for ML Ops, monitoring, and observability across distributed training jobs.
Requirements
- 10+ years of software engineering experience with a focus on large‑scale ML infrastructure.
- Deep expertise in Python, Kubernetes, and AWS cloud services.
- Hands‑on experience building and maintaining CI/CD pipelines for ML workloads.
- Strong knowledge of TensorFlow, PyTorch, and distributed training techniques.
- Proven ability to lead cross‑functional teams and drive technical strategy in a fast‑moving environment.
Skills
pythonkubernetesawstensorflowpytorchcicd