remote

Platform Operations Engineer Site Reliability Engineer - Vertiv

Site Reliability Engineer

Platform Operations Engineer (SRE) driving cross‑platform observability, monitoring, and incident response across a diverse digital ecosystem using Python, Node.js, Kubernetes, Prometheus, and Grafana.

About the role

Key Responsibilities

Design, implement, and maintain end‑to‑end monitoring and alerting pipelines for a multi‑tool digital platform stack.
Own incident response workflows, root‑cause analysis, and post‑mortem documentation to improve reliability.
Collaborate with development and product teams to define SLAs, SLOs, and reliability metrics.
Automate operational tasks and configuration management using Python and Node.js scripts.
Integrate observability solutions with enterprise tools such as Compass AI, Writer AI, Site Scope, UiPath, Workato, and Cursor.

Requirements

3+ years of SRE or platform operations experience in a cloud‑native environment.
Hands‑on experience with incident management platforms and post‑mortem processes.
Excellent communication skills and a collaborative mindset.

Skills

pythonnodejskubernetesprometheusgrafana

CompanyVertiv

DepartmentOperations

LocationWesterville, OH, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026