onsite
Cloud Reliability Engineer - Versant
Software Engineer
Design, implement, and maintain highly available cloud infrastructure on AWS, leveraging Kubernetes, Terraform, and automation with Python to ensure reliability, performance, and rapid delivery of services.
About the role
Key Responsibilities
- Architect, build, and operate scalable, secure AWS environments supporting critical media and entertainment applications.
- Develop and maintain infrastructure-as-code using Terraform and automate deployment pipelines with CI/CD tools.
- Implement and manage container orchestration with Kubernetes, ensuring high availability and efficient resource utilization.
- Monitor system health, performance, and incidents using observability platforms; create alerts and conduct root‑cause analysis.
- Collaborate with development and product teams to embed reliability best practices into the software development lifecycle.
Requirements
- 3+ years of experience in cloud operations, preferably on AWS.
- Strong proficiency with Kubernetes, Terraform, and scripting in Python.
- Hands‑on experience building CI/CD pipelines and implementing monitoring/alerting solutions.
- Solid understanding of networking, security, and disaster‑recovery concepts in cloud environments.
- Excellent problem‑solving skills and ability to work in a fast‑paced, cross‑functional team.
Skills
awskubernetesterraformpythoncicd