remoteonsite
Senior Lead Site Reliability Engineer- ETS Network - JPMorganChase
Site Reliability Engineer
Lead the design, implementation, and operation of highly available, scalable services on AWS and Kubernetes, driving SLOs, NFRs, and continuous improvement for electronic trading platforms.
About the role
Key Responsibilities
- Define and enforce non‑functional requirements and availability targets for mission‑critical trading services.
- Architect and maintain resilient, scalable infrastructure on AWS and Kubernetes, ensuring high availability and performance.
- Implement robust monitoring, alerting, and incident response processes to meet and exceed SLOs.
- Collaborate with development teams to embed SRE best practices into design, code reviews, and testing.
- Lead post‑incident reviews, root cause analysis, and continuous improvement initiatives.
Requirements
- 10+ years of experience in site reliability or DevOps roles, with a strong focus on high‑frequency trading or financial services.
- Deep expertise in AWS services (EC2, RDS, ECS/EKS, CloudWatch) and Kubernetes cluster management.
- Proficiency with CI/CD pipelines, infrastructure as code (Terraform, CloudFormation), and scripting (Python, Bash).
- Strong knowledge of monitoring tools (Prometheus, Grafana, Datadog) and incident management platforms.
- Excellent communication skills and a proven ability to lead cross‑functional teams in a fast‑paced environment.