remote
Senior Engineer, Network Observability - CoreWeave
Software Engineer
Lead the design and implementation of advanced network observability solutions, driving telemetry collection, analysis, and visualization across cloud-native environments using Prometheus, Grafana, and OpenTelemetry.
About the role
Key Responsibilities
- Architect and build scalable telemetry pipelines for distributed network services, ingesting metrics, logs, and traces into a unified observability platform.
- Integrate Prometheus, Grafana, and OpenTelemetry with Kubernetes and cloud networking components to provide real‑time visibility and alerting.
- Develop and maintain custom exporters, collectors, and dashboards in Go and Python, ensuring high reliability and performance.
- Collaborate with DevOps and security teams to implement observability best practices, including data retention, compliance, and incident response workflows.
- Mentor junior engineers, conduct code reviews, and drive continuous improvement of observability tooling and processes.
Requirements
- 5+ years of experience in network monitoring, observability, or telemetry engineering.
- Deep understanding of distributed systems, microservices, and cloud networking (VPC, load balancers, service meshes).
- Excellent problem‑solving, communication, and collaboration skills.
Skills
prometheusgrafanakubernetespythongo