onsite
Senior Site Reliability Engineer - IXL Learning
Site Reliability Engineer
Senior Site Reliability Engineer responsible for ensuring high availability, performance, and scalability of cloud‑native services using Kubernetes, AWS, Terraform, and automation with Python and CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using IaC tools such as Terraform.
- Operate and optimize Kubernetes clusters, ensuring reliable deployment, monitoring, and incident response.
- Develop automation scripts and tools in Python to streamline operational tasks and improve system reliability.
- Build and maintain CI/CD pipelines that support rapid, safe delivery of code to production.
- Collaborate with development teams to define service level objectives (SLOs), conduct capacity planning, and drive performance tuning.
Requirements
- 5+ years of experience in site reliability or production engineering roles.
- Deep expertise with Kubernetes orchestration and AWS services (EC2, RDS, S3, etc.).
- Proficiency in Terraform for infrastructure provisioning and Python for automation.
- Strong background in building and maintaining CI/CD pipelines and monitoring/alerting systems.
- Excellent problem‑solving skills, with a focus on proactive performance optimization and incident management.
Skills
kubernetesawsterraformpythoncicd