remote
Deployment Lead, AI Infrastructure, Google Cloud
Software Engineer
Lead the design, deployment, and operational excellence of AI/ML infrastructure on Google Cloud, driving stakeholder engagement, automation, and performance monitoring for enterprise‑scale solutions.
About the role
Key Responsibilities
- Architect, build, and scale AI/ML infrastructure on Google Cloud, leveraging Kubernetes, Terraform, and CI/CD pipelines to support data science and model serving workloads.
- Partner with internal and external stakeholders to define deployment requirements, manage timelines, and ensure successful delivery of AI solutions.
- Implement robust monitoring, logging, and incident response processes to maintain high availability and performance of AI services.
- Automate provisioning, configuration, and lifecycle management of resources using Python scripts and infrastructure‑as‑code practices.
- Mentor and guide cross‑functional teams, fostering best practices in security, cost optimization, and operational excellence.
Requirements
- Bachelor’s degree or equivalent practical experience with at least 10 years of troubleshooting complex technical issues.
- Minimum 7 years of experience in customer management and stakeholder engagement for large‑scale deployments.
- 5+ years of hands‑on experience with AI/ML infrastructure, including model training, serving, and pipeline orchestration on Google Cloud.
- Proficiency in Kubernetes, Terraform, Python, and CI/CD tools (e.g., Cloud Build, Jenkins).
- Strong Linux systems knowledge and a track record of implementing monitoring, alerting, and incident management frameworks.
Skills
kubernetesterraformpythoncicdlinux