remote

Site Reliability Engineer - Software Ops and Scaling , One Material Handling System - Software, Controls - Amazon.com

Site Reliability Engineer

Site Reliability Engineer focused on scaling automation infrastructure, driving DevOps excellence with Python, AWS, Docker, Kubernetes, CI/CD pipelines, Terraform, and robust monitoring to ensure reliable, efficient operations at scale.

About the role

Key Responsibilities

Design, build, and maintain scalable automation infrastructure using AWS services and container orchestration (Docker, Kubernetes).
Develop and manage CI/CD pipelines to accelerate deployment cycles and ensure high code quality.
Implement infrastructure-as-code with Terraform, automating provisioning and configuration across environments.
Collaborate with development teams to bridge gaps between code delivery and production operations.
Monitor system health, troubleshoot incidents, and implement proactive performance improvements.

Requirements

Proven experience as a Site Reliability Engineer or DevOps Engineer in a large-scale environment.
Strong scripting skills in Python and familiarity with AWS services (EC2, S3, Lambda, CloudWatch).
Hands‑on experience with Docker, Kubernetes, and CI/CD tools (Jenkins, GitLab CI, ArgoCD).
Proficiency in infrastructure-as-code using Terraform or similar tools.
Excellent problem‑solving skills, ability to work collaboratively across teams, and a passion for automation and reliability.

Skills

pythonawsdockerkubernetescicdterraform

CompanyAmazon.com

DepartmentOperations

LocationNashville, TN, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026