onsite
MLOps Platform Engineer Sagemaker - Magnit Global
Devops Engineer
Lead the design, deployment, and maintenance of scalable MLOps pipelines on Amazon SageMaker, leveraging AWS services, Python, Docker, and Terraform to deliver robust, production‑grade machine learning solutions.
About the role
Key Responsibilities
- Architect and implement end‑to‑end MLOps workflows on Amazon SageMaker, including model training, hyper‑parameter tuning, and deployment to SageMaker endpoints.
- Develop and maintain IaC using Terraform to provision and manage AWS resources such as SageMaker notebooks, training jobs, and model registries.
- Integrate CI/CD pipelines with GitHub Actions or AWS CodePipeline to automate model versioning, testing, and continuous delivery.
- Collaborate with data scientists and software engineers to optimize model performance, resource utilization, and cost efficiency.
- Monitor production models, troubleshoot issues, and implement automated alerting and logging via CloudWatch and SageMaker Model Monitor.
Requirements
- 10–15 years of software engineering experience with a focus on cloud infrastructure or ML platform operations.
- 5+ years hands‑on experience with AWS, including deep expertise in Amazon SageMaker (Studio Classic, Studio, and SageMaker Studio Lab).
- Proficiency in Python for data processing, model development, and automation scripts.
- Strong knowledge of containerization (Docker) and infrastructure as code (Terraform or CloudFormation).
- Experience with CI/CD tooling, monitoring, and observability in a production ML environment.
Skills
awspythondockerterraform