remoteonsite
Senior Go Backend Engineer - GPU Monitoring & Control - Solutionzhere
Backend Engineer
Lead the design and implementation of production‑grade backend services in Go, driving GPU health monitoring, observability, diagnostics, and control across enterprise‑scale infrastructure. Deliver scalable, reliable systems that empower next‑generation server management.
About the role
Key Responsibilities
- Architect and develop high‑throughput Go services for GPU telemetry collection, health monitoring, and control commands.
- Design observability pipelines (metrics, logs, traces) to surface GPU performance and failure data to operators.
- Implement diagnostics tooling that automatically identifies GPU faults and suggests remediation steps.
- Collaborate with front‑end and platform teams to expose REST/gRPC APIs for GPU management.
- Ensure system reliability, scalability, and security through rigorous testing, CI/CD, and cloud‑native best practices.
Requirements
- 5+ years of backend engineering experience, with deep proficiency in Go.
- Hands‑on experience building monitoring, observability, and control systems for GPU or similar high‑performance hardware.
- Strong knowledge of distributed systems, container orchestration (Kubernetes), and cloud platforms (AWS/GCP).
- Proficiency in designing RESTful and gRPC APIs, and working with time‑series databases.
- Excellent problem‑solving skills and a passion for building reliable, production‑grade services.