onsite
Site Reliability Manager, Data center Networking, SRE - Google
Site Reliability Engineer
Lead a team of SREs focused on data‑center networking, driving reliability and performance through automation, cloud networking, and distributed systems engineering using Python, Go, and Kubernetes.
About the role
Key Responsibilities
- Lead, mentor, and grow a team of Site Reliability Engineers responsible for data‑center networking infrastructure.
- Design, implement, and maintain automation frameworks and tooling (Python, Go) to improve reliability, capacity planning, and incident response.
- Develop and operate Kubernetes‑based platforms and networking services, ensuring high availability and low latency.
- Analyze, troubleshoot, and resolve complex distributed system failures across Linux environments.
- Collaborate with product, security, and infrastructure teams to define SLAs, SLOs, and reliability best practices.
Requirements
- Bachelor's degree in Computer Science or related field, or equivalent practical experience.
- 8+ years of software development experience with strong data structures and algorithms knowledge.
- 3+ years of experience managing technical teams or projects, including design and troubleshooting of distributed systems.
- Proficiency in Python and Go, deep experience with Linux, Kubernetes, and cloud networking technologies.
- Demonstrated ability to drive reliability initiatives, define metrics, and implement automation at scale.
Skills
pythongolinuxkubernetes