remoteonsite

Site Reliability Engineer III - JPMorganChase

Site Reliability Engineer

Senior Site Reliability Engineer focused on building and optimizing cloud-native infrastructure for AI/ML and data platforms using Python, AWS, Kubernetes, Terraform, Docker, and CI/CD pipelines, while ensuring high availability and performance through robust monitoring and automation.

About the role

Key Responsibilities

Design, implement, and maintain scalable, highly available infrastructure for AI/ML and data platform services on AWS.
Develop and manage Kubernetes clusters, Helm charts, and Terraform modules to automate deployment and configuration.
Build and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools to streamline code delivery.
Implement monitoring, alerting, and logging solutions with Prometheus, Grafana, and ELK stack to ensure system reliability.
Collaborate with data scientists, developers, and security teams to optimize performance, cost, and compliance.

Requirements

5+ years of experience in site reliability engineering or DevOps roles.
Proficiency in Python scripting and automation.
Hands‑on experience with AWS services (EKS, EC2, S3, RDS, CloudWatch).
Strong knowledge of Kubernetes, Helm, and Terraform for infrastructure as code.
Experience with CI/CD, containerization (Docker), and monitoring tools (Prometheus, Grafana).

Skills

pythonawskubernetesterraformdockercicdprometheus

CompanyJPMorganChase

DepartmentEngineering

LocationJersey City, NJ, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026