remote
Agentic AI / AI Ops Engineer - Platform Engineering - Caterpillar
MLOps Engineer
Lead the design and delivery of Agentic AI and AI Ops solutions on a scalable platform, driving intelligent automation and reliability using Python, AWS, Kubernetes, and ML Ops practices.
About the role
Key Responsibilities
- Architect and implement Agentic AI and AI Ops services that automate platform operations and enhance reliability.
- Collaborate with data scientists and platform teams to integrate ML models into production pipelines.
- Design scalable, secure infrastructure on AWS using Kubernetes, Terraform, and CI/CD pipelines.
- Develop monitoring, alerting, and self-healing mechanisms for AI-driven operations.
- Provide technical mentorship and code reviews to ensure best practices and high code quality.
Requirements
- 5+ years of experience in platform engineering with a focus on AI Ops or related domains.
- Proficiency in Python, AWS services (EKS, Lambda, S3, CloudWatch), and Kubernetes.
- Hands‑on experience with ML Ops tools (MLflow, Kubeflow) and CI/CD pipelines.
- Strong understanding of distributed systems, observability, and security best practices.
- Excellent communication skills and a collaborative mindset.
Skills
pythonawskubernetes