onsite
AI Operations Engineer - Salesforce
Systems Engineer
AI Operations Engineer responsible for deploying, monitoring, and scaling AI/ML models in production using Python, Kubernetes, Docker, CI/CD pipelines, and AWS services to ensure high availability and performance.
About the role
Key Responsibilities
- Design, build, and maintain scalable AI/ML model deployment pipelines using Docker and Kubernetes.
- Implement CI/CD workflows for model versioning, testing, and automated rollouts.
- Monitor model performance, latency, and resource utilization in production, and troubleshoot issues.
- Collaborate with data scientists and software engineers to integrate new models into existing services.
- Automate infrastructure provisioning and configuration using IaC tools (e.g., Terraform) on AWS.
Requirements
- Proven experience with Python and container orchestration (Kubernetes/Docker).
- Strong background in CI/CD pipeline design and implementation.
- Hands‑on experience with AWS services (EKS, ECS, S3, Lambda).
- Familiarity with monitoring tools (Prometheus, Grafana, CloudWatch).
- Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
Skills
pythonkubernetesdockercicdaws