remote
Senior Software Engineer, Observability - Okta
Software Engineer
Lead the design and implementation of observability tooling for a high‑scale identity platform, driving end‑to‑end monitoring, instrumentation, and alerting across distributed services using Python, Go, and AWS.
About the role
Key Responsibilities
- Architect and build scalable observability pipelines that ingest metrics, logs, and traces from a globally distributed identity platform.
- Design and implement instrumentation libraries for microservices, ensuring consistent telemetry across languages.
- Develop and maintain dashboards, alerting rules, and automated remediation workflows in AWS CloudWatch, Grafana, and Prometheus.
- Collaborate with platform, security, and reliability teams to define SLAs, SLOs, and incident response playbooks.
- Lead root‑cause analysis and post‑mortem investigations, translating findings into actionable improvements.
Requirements
- 5+ years of software engineering experience with a focus on observability, monitoring, or site reliability.
- Strong proficiency in Python and Go, with experience building production‑grade telemetry libraries.
- Hands‑on experience with distributed tracing (OpenTelemetry, Jaeger), metrics (Prometheus, CloudWatch), and log aggregation (ELK, Loki).
- Deep understanding of cloud infrastructure, especially AWS services such as CloudWatch, Kinesis, and Lambda.
- Excellent problem‑solving skills, ability to work in a fast‑paced, high‑impact environment, and strong communication abilities.