onsite
Senior Site Reliability Engineer - UJET
Site Reliability Engineer
Senior Site Reliability Engineer leading cloud infrastructure, DevOps practices, and system reliability for high‑availability services using modern cloud platforms and automation tools.
About the role
Key Responsibilities
- Design, implement, and maintain scalable, highly available cloud infrastructure across multiple regions.
- Lead DevOps initiatives, including CI/CD pipelines, configuration management, and automated monitoring.
- Collaborate with development teams to embed reliability and performance best practices into the software delivery lifecycle.
- Diagnose and resolve complex production incidents, driving post‑mortem analysis and continuous improvement.
- Mentor junior engineers and foster a culture of operational excellence and proactive risk mitigation.
Requirements
- 5+ years of experience in site reliability engineering or related roles.
- Proficiency with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (Terraform, CloudFormation).
- Strong scripting skills in Python or Go, and experience with container orchestration (Kubernetes).
- Deep understanding of monitoring, alerting, and incident response frameworks.
- Excellent communication skills and a collaborative mindset.
Skills
kubernetesprometheusgrafana