remote
Senior Backend Software Engineer Observability - Nebius
Software Engineer
Lead the design and implementation of scalable observability services for a cloud-native AI platform, leveraging Python, Go, Kubernetes, and AWS to deliver robust monitoring, tracing, and alerting solutions.
About the role
Key Responsibilities
- Architect and develop high‑performance observability services in Python and Go, integrating with Kubernetes and AWS infrastructure.
- Design and maintain distributed tracing, metrics collection, and log aggregation pipelines using Prometheus, Grafana, and OpenTelemetry.
- Collaborate with cross‑functional teams to define SLAs, alerting rules, and dashboards that support AI workloads.
- Optimize service reliability and scalability, performing root‑cause analysis and implementing automated remediation.
- Mentor junior engineers and contribute to best‑practice documentation for observability tooling.
Requirements
- 5+ years of backend engineering experience with a focus on observability or monitoring.
- Proficiency in Python and Go, with solid understanding of microservices architecture.
- Hands‑on experience deploying and managing services on Kubernetes and AWS.
- Deep knowledge of Prometheus, Grafana, OpenTelemetry, and distributed tracing concepts.
- Strong problem‑solving skills and a passion for building reliable, scalable systems.
Skills
pythongokubernetesawsprometheusgrafana