remote
AI Staff Platform Engineer - Marks & Spencer
Devops Engineer
Lead the design and delivery of AI platform services, building scalable, cloud‑native infrastructure with Python, AWS, and Kubernetes to support machine learning workloads and continuous delivery pipelines.
About the role
Key Responsibilities
- Architect and implement AI platform services on AWS, ensuring high availability, security, and scalability for machine learning workloads.
- Develop and maintain CI/CD pipelines using Kubernetes and container orchestration to automate model deployment and data pipeline workflows.
- Collaborate with data scientists and product teams to translate ML model requirements into robust, production‑ready infrastructure.
- Monitor platform performance, troubleshoot incidents, and implement proactive optimizations to meet SLAs.
- Document platform architecture, best practices, and operational procedures for internal teams.
Requirements
- Proven experience building cloud‑native AI/ML platforms on AWS with services such as SageMaker, ECS/EKS, and Lambda.
- Strong proficiency in Python and container technologies (Docker, Kubernetes).
- Hands‑on expertise in CI/CD tooling (GitHub Actions, Jenkins, ArgoCD) and infrastructure as code (Terraform, CloudFormation).
- Solid understanding of ML Ops principles, model versioning, and data pipeline orchestration.
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.
Skills
pythonawskubernetesmachine learningcicd