remote
Site Reliability Engineer Remote - Wesco
Site Reliability Engineer
Lead reliability initiatives for cloud‑native services, designing scalable architecture, automating deployments, and mentoring engineers while ensuring high availability and performance on AWS platforms.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using IaC tools such as Terraform.
- Develop and manage container orchestration platforms (Kubernetes) and CI/CD pipelines to support rapid, reliable releases.
- Monitor system health, troubleshoot incidents, and drive root‑cause analysis to improve reliability and performance.
- Provide technical guidance and mentorship to associate engineers and cross‑functional teams.
- Collaborate with product, development, and operations teams to define architectural requirements and ensure alignment with service level objectives.
Requirements
- 5+ years of experience in site reliability or DevOps engineering, with a strong focus on Linux environments.
- Proficiency in cloud services (AWS) and container orchestration (Kubernetes).
- Hands‑on experience with infrastructure as code (Terraform) and scripting/automation using Python.
- Solid understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK).
- Excellent problem‑solving skills and ability to mentor junior engineers.
Skills
linuxkubernetesawsterraformpythoncicd