remote
Automation Support Analyst - Site Reliability Engineering - OMERS
Software Engineer
Site Reliability Engineering analyst focused on automating infrastructure, improving reliability, and supporting production services using Python, Bash, Kubernetes, Terraform, and AWS cloud technologies.
About the role
Key Responsibilities
- Design, develop, and maintain automation scripts and tools (Python, Bash) to streamline provisioning, configuration, and deployment of services.
- Manage and operate containerized workloads on Kubernetes clusters, ensuring high availability and performance.
- Implement Infrastructure as Code using Terraform to provision and version‑control AWS resources.
- Build and maintain CI/CD pipelines that enable rapid, reliable releases while enforcing quality gates.
- Monitor system health, set up alerts, and perform root‑cause analysis for incidents, driving continuous improvement of reliability.
Requirements
- 3+ years of experience in a SRE or automation role, with strong Linux administration skills.
- Proficiency in Python and Bash scripting for automation and tooling.
- Hands‑on experience with Kubernetes orchestration and AWS cloud services.
- Solid understanding of Terraform or similar IaC frameworks.
- Experience with monitoring, logging, and incident response processes (e.g., CloudWatch, Prometheus, Grafana).
Skills
pythonbashlinuxkubernetesterraformawscicd