onsite
Site Reliability Analyst - Canadian Tire Corporation, Ltd.
Software Engineer
Site Reliability Analyst responsible for maintaining production system reliability through incident response, automation, monitoring, and CI/CD pipeline support, collaborating with senior SREs and DevOps teams.
About the role
Key Responsibilities
- Respond to production incidents, perform root‑cause analysis, and drive timely resolution.
- Develop and maintain automation scripts and tools to reduce manual operational toil.
- Implement and enhance monitoring, alerting, and observability solutions for critical services.
- Support CI/CD pipelines, ensuring reliable build, test, and deployment processes.
- Collaborate with senior SREs, DevOps engineers, and platform teams to adopt industry‑standard reliability practices.
- Utilize AI‑assisted tooling for operational efficiency without requiring deep machine‑learning expertise.
Requirements
- Strong experience with scripting languages such as Python and Bash.
- Hands‑on knowledge of container orchestration platforms, preferably Kubernetes.
- Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI, Azure DevOps) and infrastructure‑as‑code frameworks like Terraform.
- Proven ability to design, implement, and maintain monitoring and alerting systems (e.g., Prometheus, Grafana, CloudWatch).
- Excellent problem‑solving skills and a collaborative mindset for working with cross‑functional engineering teams.
Skills
pythonbashkubernetescicdterraform