onsite

SRE Engineer / Site Reliability Engineer Specialist - NTT Data Americas, Inc.

Site Reliability Engineer

Senior SRE Engineer responsible for designing, deploying, and maintaining highly available cloud-native services using Kubernetes, Docker, and CI/CD pipelines. Leverages AWS, monitoring tools, and Python scripting to ensure reliability, performance, and rapid incident response.

About the role

Key Responsibilities

Design, implement, and operate scalable, highly available services on Kubernetes clusters in AWS.
Build and maintain CI/CD pipelines to automate application delivery and infrastructure changes.
Implement monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to detect and resolve incidents quickly.
Collaborate with development teams to embed reliability best practices into the software development lifecycle.
Conduct post‑mortems, root cause analysis, and continuous improvement initiatives to reduce MTTR.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles.
Proficient with Kubernetes, Docker, and cloud platforms (AWS preferred).
Strong scripting skills in Python or Bash for automation.
Hands‑on experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD).
Excellent problem‑solving skills and ability to work in a fast‑paced environment.

Skills

kubernetesdockercicdawspython

CompanyNTT Data Americas, Inc.

DepartmentEngineering

LocationAddison, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026