remote
Senior Platform SRE - IG Group
Site Reliability Engineer
Senior Platform SRE responsible for designing, building, and maintaining highly available, scalable infrastructure on AWS and Kubernetes, automating deployments with Terraform, Go, and Python, and ensuring robust monitoring and incident response across a global fintech platform.
About the role
Key Responsibilities
- Design, implement, and operate highly available, scalable infrastructure on AWS and Kubernetes for a global fintech platform.
- Automate infrastructure provisioning and configuration using Terraform, Go, and Python, ensuring repeatable, auditable deployments.
- Develop and maintain CI/CD pipelines, integrating automated testing, security scanning, and blue‑green deployments.
- Implement comprehensive monitoring, alerting, and incident response processes using Prometheus, Grafana, and PagerDuty.
- Collaborate with development, security, and product teams to define and enforce best practices for reliability, performance, and cost optimization.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles within high‑scale, mission‑critical environments.
- Proficiency with AWS services (EC2, RDS, S3, VPC, IAM) and Kubernetes cluster management.
- Strong scripting and automation skills in Go and Python, with hands‑on Terraform experience.
- Deep understanding of CI/CD tooling, container orchestration, and observability stack (Prometheus, Grafana, Loki).
- Excellent problem‑solving, communication, and collaboration skills in a fast‑paced, cross‑functional team.
Skills
kubernetesawsterraformgopythoncicd