remote
Sr. Site Reliability Engineer - Synopsys
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud-native infrastructure using Kubernetes, Docker, AWS, and Terraform, while ensuring robust monitoring, automation, and incident response across global services.
About the role
Key Responsibilities
- Design, implement, and manage scalable Kubernetes clusters and Docker-based workloads across AWS environments.
- Develop and maintain IaC pipelines with Terraform, ensuring repeatable and auditable infrastructure deployments.
- Implement comprehensive monitoring, logging, and alerting solutions using Prometheus, Grafana, and ELK stack to detect and resolve incidents proactively.
- Automate operational tasks and CI/CD pipelines with Python scripts and GitHub Actions, improving deployment velocity and reliability.
- Collaborate with development teams to enforce best practices for performance, security, and cost optimization.
Requirements
- 5+ years of experience in site reliability or DevOps roles, with deep expertise in Kubernetes and container orchestration.
- Proficient in AWS services (EKS, EC2, S3, CloudWatch) and Terraform for infrastructure provisioning.
- Strong scripting skills in Python and experience with CI/CD tooling.
- Hands‑on experience with monitoring and observability tools such as Prometheus, Grafana, and ELK.
- Excellent problem‑solving abilities, strong communication skills, and a collaborative mindset.
Skills
kubernetesdockerawsterraformpython