remote
Site Reliability Engineer SRE - BV Teck
Site Reliability Engineer
Join a fast‑growing tech team as a Site Reliability Engineer, driving automation, scalability, and reliability of cloud‑native applications using Python, Kubernetes, Docker, AWS, and modern DevOps tooling.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using IaC tools such as Terraform.
- Develop automation scripts and services in Python to streamline deployment, configuration, and incident response workflows.
- Manage container orchestration platforms (Kubernetes, Docker) and ensure seamless CI/CD pipeline integration.
- Monitor system performance, set up alerting, and conduct root‑cause analysis to improve reliability and reduce mean time to recovery.
- Collaborate with development and product teams to embed reliability best practices into the software development lifecycle.
Requirements
- 3+ years of experience in site reliability, DevOps, or systems engineering roles.
- Strong proficiency in Linux administration and scripting with Python.
- Hands‑on experience with Kubernetes, Docker, and CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions).
- Deep understanding of AWS services, networking, and security best practices.
- Familiarity with infrastructure‑as‑code (Terraform or CloudFormation) and monitoring solutions (Prometheus, Grafana, CloudWatch).
Skills
pythonlinuxkubernetesdockerawsterraformcicd