onsite
Site Reliability Engineer - Snapp
Site Reliability Engineer
Site Reliability Engineer at Snapp focused on automating workflows, maintaining stable staging environments, and enhancing observability stacks to support QA and development teams with reliable infrastructure and consistent on‑call support.
About the role
Site Reliability Engineer at Snapp.
Key technologies: Kubernetes, Prometheus, Grafana.
Key Responsibilities
- Define and track SLOs, SLIs and error budgets
- Design and implement observability stacks (metrics, logging, tracing)
- Automate toil and improve system reliability through engineering
- Conduct post-mortems and drive blameless incident retrospectives
Requirements
- 3+ years of relevant experience in site reliability engineer
- Proficiency with monitoring tools (Prometheus, Grafana, Datadog)
- Strong programming skills for automation and tooling
Skills
kubernetescicdpython