onsite

Senior Site Reliability Engineer I - American Express

Site Reliability Engineer

Senior Site Reliability Engineer responsible for designing and operating highly available, observable systems, automating deployments, and driving reliability best practices using cloud, container, and monitoring technologies.

About the role

Key Responsibilities

Design, implement, and maintain scalable SRE solutions on AWS, including infrastructure as code with Terraform.
Develop and manage container orchestration platforms (Kubernetes, Docker) to ensure high availability and performance.
Build and enhance real‑time observability pipelines using Prometheus, Grafana, and custom metrics.
Automate deployment, configuration, and incident response workflows with CI/CD pipelines and Python scripting.
Collaborate with development, security, and product teams to embed reliability and automation into the software lifecycle.

Requirements

5+ years of experience in site reliability, DevOps, or systems engineering.
Strong expertise with AWS services, Kubernetes, Docker, and Terraform.
Proficiency in scripting/automation using Python and CI/CD tools (Jenkins, GitHub Actions, etc.).
Hands‑on experience with monitoring and alerting stacks such as Prometheus and Grafana.
Solid understanding of networking, Linux systems, and incident management processes.

Skills

kubernetesdockerawsterraformprometheuspythoncicd

CompanyAmerican Express

DepartmentEngineering

LocationPhoenix, Arizona, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 27, 2026