Future Openings - SRE Support Engineer - Observability
Future Openings - SRE Support Engineer - Observability position — see original posting for full details.
SRE Support Engineer - Observability
While this position is not currently open, we are interviewing strong candidates for upcoming opportunities on this team.
Location: Remote | Time Zone: (US, Canada, Brazil, Chile, Colombia, Mexico) (8AM–5PM Pacific)
Freedom to grow. Power to deliver. Virtasant is a global technology services company delivering large-scale cloud, data, and engineering solutions across 130+ countries. We partner with some of the world’s largest organizations to help them build, operate, and scale internal platforms used by tens of thousands of engineers.
For this role, you will be supporting one of the most advanced internal developer platforms in the world, powering products used by hundreds of millions of people. The problems you will solve are deep, complex, and essential to keeping a global-scale organization moving.
Role Overview
The Observability & Tools Support Engineer provides high-impact technical support for customers of a large technology company’s internal IaaS platform, with a focus on monitoring, alerting, telemetry, and operational tooling .
This role spans a wide range of support—from white-glove onboarding and end-to-end customer enablement, to deep technical troubleshooting across Linux, networking, and observability systems (especially Prometheus and AlertManager ). You will also contribute to improving the support function itself: strengthening tooling, documentation, workflows, and feedback loops so the service scales.
Success depends on excellent troubleshooting, strong written communication, comfort working with highly technical customers, and the maturity to identify patterns and drive operational improvements beyond individual ticket resolution.
Business Outcome
Become a trusted frontline expert for the customer’s observability ecosystem and operational tooling - delivering fast, accurate support across Slack and tickets, improving monitoring reliability, and reducing incident impact through better triage, troubleshooting, onboarding, and knowledge capture.
Success Measures
Healthy volume of threads and tickets handled with high-quality outcomes
Consistent achievement of time-based SLAs
High customer satisfaction through surveys
Accurate classification of issue type, severity, and recurring patterns
Reduced repeat issues through better docs, tooling, and scalable onboarding
What Will Be True When You Succeed
Customers can onboard smoothly to monitoring/alerting with minimal friction
Monitoring and alerting issues are resolved quickly, with fewer escalations
Linux and networking-related incidents reach resolution faster due to strong troubleshooting and clean handoffs
Engineering and SRE teams receive clear, actionable feedback based on real customer trends
Knowledge base content prevents
Posted June 12, 2026