remote
AI Infrastructure Engineer - Crowe LLP
Devops Engineer
Design, build, and maintain scalable AI infrastructure on cloud platforms, leveraging Kubernetes, Docker, Terraform, and CI/CD pipelines to support data science and machine learning workloads.
About the role
Key Responsibilities
- Architect and implement cloud‑native infrastructure for AI/ML workloads, ensuring high availability, security, and cost efficiency.
- Develop and maintain container orchestration solutions using Kubernetes and Docker to streamline model training and inference pipelines.
- Automate provisioning and configuration management with Terraform and CI/CD tools (e.g., Jenkins, GitHub Actions) to support rapid deployment cycles.
- Collaborate with data scientists and software engineers to integrate AI models into production environments, optimizing performance and scalability.
- Monitor, troubleshoot, and continuously improve infrastructure performance using observability tools and cloud services (AWS CloudWatch, Prometheus, Grafana).
Requirements
- 3+ years of experience designing and operating cloud infrastructure, preferably on AWS.
- Strong proficiency in Python for scripting and automation.
- Hands‑on experience with Kubernetes, Docker, and Infrastructure‑as‑Code tools such as Terraform.
- Familiarity with CI/CD pipelines and DevOps best practices for AI/ML deployments.
- Understanding of machine learning lifecycle concepts and MLOps tooling.
Skills
pythonkubernetesdockerterraformawscicd