onsite
AI Infrastructure Manager - Postman
Software Engineer
Lead the design, deployment, and scaling of AI/ML infrastructure using cloud services, Kubernetes, and automation tools to support data science and production workloads.
About the role
Key Responsibilities
- Architect and implement scalable AI/ML platforms on AWS, leveraging Kubernetes, Docker, and serverless services.
- Automate provisioning and configuration of infrastructure using Terraform and CI/CD pipelines.
- Collaborate with data science and engineering teams to optimize model training, inference, and monitoring workflows.
- Establish best practices for security, cost management, and reliability across AI workloads.
- Mentor and guide junior engineers, fostering a culture of continuous improvement and operational excellence.
Requirements
- 5+ years of experience managing cloud‑native infrastructure for AI/ML or data‑intensive applications.
- Strong proficiency in Python and scripting for automation.
- Hands‑on expertise with Kubernetes, Docker, Terraform, and CI/CD tools (e.g., Jenkins, GitHub Actions).
- Deep understanding of AWS services (EKS, SageMaker, S3, Lambda) and cost‑optimization strategies.
- Experience implementing monitoring, logging, and security controls for production AI systems.
Skills
pythonkubernetesterraformawsmachine learningcicddocker