onsite

Staff Site Reliability Engineer, Cloud Reliability Intelligence - Google

Site Reliability Engineer

Senior SRE leader driving reliability for cloud services, designing end‑to‑end observability, automation, and AI‑powered workflow improvements across full‑stack architectures.

About the role

Key Responsibilities

Architect, implement, and operate highly reliable, scalable cloud services supporting critical workloads.
Lead cross‑functional teams to design, analyze, and troubleshoot distributed systems, ensuring end‑to‑end performance and availability.
Develop and integrate AI/LLM‑based automation to streamline incident response, root‑cause analysis, and operational workflows.
Define and enforce reliability standards, service‑level objectives, and policy conformance across the organization.
Mentor engineers, drive technical roadmaps, and oversee project delivery from concept through production.

Requirements

8+ years of experience with data structures, algorithms, and large‑scale system design.
3+ years of hands‑on leadership in building and operating distributed, full‑stack cloud architectures.
Proven track record applying Generative AI or LLMs to automate reliability and operational processes.
Deep expertise in site reliability engineering practices, including monitoring, alerting, incident management, and capacity planning.
Strong communication and mentorship skills, with the ability to influence technical direction across multiple teams.

Skills

generative ai

CompanyGoogle

DepartmentEngineering

LocationSunnyvale, California, United States

Experience8+ years

Tenurefull-time

LevelLead

Posted June 25, 2026