remoteonsite
Senior Site Reliability Engineer - Metantz
Site Reliability Engineer
Senior SRE focused on application monitoring, automation, and performance optimization for a 24x7 FX trading platform, leveraging cloud infrastructure, container orchestration, and observability tools.
About the role
Key Responsibilities
- Design, implement, and maintain automated monitoring and alerting solutions for high‑frequency FX trading applications.
- Develop and manage infrastructure-as-code using Terraform and AWS services to ensure reliable, repeatable deployments.
- Build and optimize CI/CD pipelines, integrating testing, security, and performance checks.
- Collaborate with development and product teams to improve application reliability, reduce MTTR, and enhance user experience.
- Lead incident response, perform root‑cause analysis, and drive post‑mortem improvements.
Requirements
- 5+ years of SRE or DevOps experience in a high‑availability, low‑latency trading or financial services environment.
- Strong programming/scripting skills in Python, Go, and Bash.
- Hands‑on experience with Kubernetes, Docker, and cloud platforms (AWS).
- Proficiency with observability tools such as Prometheus, Grafana, and logging solutions.
- Solid understanding of CI/CD concepts, Terraform, and infrastructure automation.
Skills
pythongokubernetesprometheusterraformawscicdbash