onsite

Site Reliability Manager, Data center Networking, SRE - Google

Site Reliability Engineer

Lead a team of SREs focused on data‑center networking, driving reliability and performance through automation, cloud networking, and distributed systems engineering using Python, Go, and Kubernetes.

About the role

Key Responsibilities

Lead, mentor, and grow a team of Site Reliability Engineers responsible for data‑center networking infrastructure.
Design, implement, and maintain automation frameworks and tooling (Python, Go) to improve reliability, capacity planning, and incident response.
Develop and operate Kubernetes‑based platforms and networking services, ensuring high availability and low latency.
Analyze, troubleshoot, and resolve complex distributed system failures across Linux environments.
Collaborate with product, security, and infrastructure teams to define SLAs, SLOs, and reliability best practices.

Requirements

Bachelor's degree in Computer Science or related field, or equivalent practical experience.
8+ years of software development experience with strong data structures and algorithms knowledge.
3+ years of experience managing technical teams or projects, including design and troubleshooting of distributed systems.
Proficiency in Python and Go, deep experience with Linux, Kubernetes, and cloud networking technologies.
Demonstrated ability to drive reliability initiatives, define metrics, and implement automation at scale.

Skills

pythongolinuxkubernetes

CompanyGoogle

DepartmentEngineering

LocationWaterloo, CA, United States

Experience8+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026