onsite
AI DevOps Engineer - University of Washington
Devops Engineer
Design, implement, and maintain scalable AI/ML infrastructure using cloud services, container orchestration, and automated pipelines to accelerate research and production workloads.
About the role
Key Responsibilities
- Develop and operate end‑to‑end CI/CD pipelines for training, testing, and deploying machine‑learning models.
- Design, provision, and manage cloud‑native infrastructure (AWS, GCP, or Azure) supporting AI workloads.
- Containerize ML applications with Docker and orchestrate them using Kubernetes for reproducibility and scalability.
- Collaborate with data scientists and researchers to translate prototypes into production‑ready services.
- Implement monitoring, logging, and alerting solutions to ensure reliability and performance of AI systems.
Requirements
- Strong programming skills in Python and experience with ML libraries (e.g., TensorFlow, PyTorch).
- Hands‑on experience with container technologies (Docker) and orchestration platforms (Kubernetes).
- Proficiency in building CI/CD workflows using tools such as Jenkins, GitLab CI, or GitHub Actions.
- Solid understanding of cloud platforms (AWS preferred) and infrastructure‑as‑code (Terraform, CloudFormation).
- Experience with Linux environments, networking, and performance tuning for AI workloads.
Skills
pythondockerkubernetescicdawsmachine learning