remote
Senior Site Reliability Engineer - Experian
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, automating, and scaling highly available cloud infrastructure, leveraging Kubernetes, Terraform, and AWS while implementing robust monitoring and incident response processes.
About the role
Key Responsibilities
- Design, build, and maintain scalable, resilient infrastructure on AWS using IaC tools such as Terraform.
- Deploy, manage, and optimize containerized workloads with Kubernetes, ensuring high performance and availability.
- Develop automation scripts and tools in Python to streamline operational tasks and improve reliability.
- Implement comprehensive monitoring, alerting, and logging solutions (e.g., Prometheus, Grafana, ELK) to proactively detect and resolve incidents.
- Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve system stability.
Requirements
- 5+ years of experience in site reliability or DevOps engineering, with deep expertise in Linux environments.
- Proven hands‑on experience with Kubernetes orchestration and AWS services (EC2, RDS, S3, IAM, etc.).
- Strong proficiency in infrastructure as code using Terraform or similar tools.
- Solid programming/scripting skills in Python for automation and tooling.
- Experience with monitoring, alerting, and logging frameworks and a track record of implementing CI/CD pipelines.
Skills
linuxkubernetesterraformpythonaws