remote
Site Reliability Engineer - Software Ops and Scaling , One Material Handling System - Software, Controls - Amazon.com
Site Reliability Engineer
Site Reliability Engineer focused on scaling automation infrastructure, driving DevOps excellence with Python, AWS, Docker, Kubernetes, CI/CD pipelines, Terraform, and robust monitoring to ensure reliable, efficient operations at scale.
About the role
Key Responsibilities
- Design, build, and maintain scalable automation infrastructure using AWS services and container orchestration (Docker, Kubernetes).
- Develop and manage CI/CD pipelines to accelerate deployment cycles and ensure high code quality.
- Implement infrastructure-as-code with Terraform, automating provisioning and configuration across environments.
- Collaborate with development teams to bridge gaps between code delivery and production operations.
- Monitor system health, troubleshoot incidents, and implement proactive performance improvements.
Requirements
- Proven experience as a Site Reliability Engineer or DevOps Engineer in a large-scale environment.
- Strong scripting skills in Python and familiarity with AWS services (EC2, S3, Lambda, CloudWatch).
- Hands‑on experience with Docker, Kubernetes, and CI/CD tools (Jenkins, GitLab CI, ArgoCD).
- Proficiency in infrastructure-as-code using Terraform or similar tools.
- Excellent problem‑solving skills, ability to work collaboratively across teams, and a passion for automation and reliability.
Skills
pythonawsdockerkubernetescicdterraform