remote
Senior Site Reliability Engineer SRE - Tradeweb
Site Reliability Engineer
Senior Site Reliability Engineer driving high‑availability, scalable infrastructure for a global electronic trading platform using Kubernetes, Docker, AWS, and Terraform to ensure 99.99% uptime and rapid incident response.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for a global electronic trading platform.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to continuously improve reliability.
- Automate deployment pipelines with CI/CD tools, Terraform, and container orchestration (Kubernetes).
- Monitor system health using Prometheus, Grafana, and custom alerts; optimize performance and cost.
- Collaborate with development, security, and product teams to embed reliability best practices into the software lifecycle.
Requirements
- 5+ years of SRE or DevOps experience in a high‑frequency trading or financial services environment.
- Proficient with Kubernetes, Docker, and cloud platforms (AWS preferred).
- Strong scripting skills (Python, Bash) and experience with IaC tools (Terraform, CloudFormation).
- Hands‑on experience with monitoring, alerting, and incident management tools (Prometheus, Grafana, PagerDuty).
- Excellent communication, problem‑solving, and collaboration skills.
Skills
kubernetesdockerawsterraform