remote
Senior Site Reliability Engineer - Garmin
Site Reliability Engineer
Senior Site Reliability Engineer responsible for managing production releases, ensuring system stability, and implementing real‑time monitoring and automation using Linux, Kubernetes, Terraform, Python, and CI/CD pipelines.
About the role
Key Responsibilities
- Plan, coordinate, and execute basic to complex releases into the production environment, ensuring clear communication and documentation.
- Monitor production systems in real time, quickly identifying and resolving incidents to maintain high availability.
- Design, implement, and maintain automation frameworks for deployment, configuration management, and infrastructure provisioning.
- Collaborate with development and operations teams to improve reliability, performance, and scalability of services.
- Develop and maintain observability solutions, including metrics, logging, and alerting, using modern monitoring tools.
Requirements
- 5+ years of experience in site reliability or DevOps roles with a strong focus on Linux environments.
- Proficiency with container orchestration platforms such as Kubernetes.
- Hands‑on experience with infrastructure‑as‑code tools like Terraform.
- Strong scripting/programming skills in Python and familiarity with CI/CD pipelines.
- Demonstrated ability to design and operate monitoring and alerting systems for large‑scale production services.
Skills
linuxkubernetesterraformpythoncicd