remote
Observability Engineer Prometheus / Grafana / Datadog - BV Teck
Software Engineer
Observability Engineer focused on Prometheus, Grafana, and Datadog to design, implement, and maintain end‑to‑end monitoring solutions that ensure high availability and performance of cloud‑native services.
About the role
Key Responsibilities
- Design, deploy, and manage Prometheus, Grafana, and Datadog monitoring stacks across multi‑environment infrastructures.
- Develop and maintain custom dashboards, alerting rules, and data visualizations to provide actionable insights for engineering and product teams.
- Collaborate with DevOps and SRE teams to integrate observability into CI/CD pipelines and incident response workflows.
- Analyze performance metrics, logs, and traces to identify bottlenecks, troubleshoot incidents, and recommend capacity planning improvements.
- Document monitoring architecture, best practices, and runbooks to enable knowledge transfer and continuous improvement.
Requirements
- 3+ years of experience with Prometheus, Grafana, and Datadog in production environments.
- Strong understanding of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Proficiency in scripting (Python, Bash) and infrastructure as code (Terraform, Helm).
- Excellent problem‑solving skills and ability to communicate complex technical concepts to cross‑functional teams.
- Experience with alerting best practices, incident management, and post‑mortem analysis.
Skills
prometheusgrafanadatadog