onsite
Senior Site Reliability Engineer - CCC Intelligent Solutions
Site Reliability Engineer
Senior SRE leading reliability and automation for large‑scale cloud platforms, focusing on Kubernetes, Terraform, AWS, and observability tooling.
About the role
Key Responsibilities
- Design, implement, and operate highly available Kubernetes clusters across multiple cloud regions.
- Automate infrastructure provisioning and configuration management using Terraform and Python scripts.
- Develop and maintain CI/CD pipelines to support rapid, reliable deployments.
- Implement monitoring, alerting, and performance tuning with Prometheus, Grafana, and related observability tools.
- Collaborate with development and product teams to define SLOs, conduct incident response, and drive post‑mortem analyses.
Requirements
- 5+ years of experience in site reliability or production engineering roles.
- Deep expertise with Kubernetes, container orchestration, and cloud platforms (AWS preferred).
- Strong proficiency in infrastructure as code (Terraform) and scripting (Python, Bash).
- Hands‑on experience with monitoring, logging, and alerting stacks such as Prometheus/Grafana.
- Solid understanding of Linux systems, networking, and CI/CD concepts.
Skills
kubernetesterraformpythonawsprometheuscicdlinux