onsite
Senior Software Engineer - Site Reliability Engineering - Google
Software Engineer
Lead the design, implementation, and operation of large‑scale distributed services on Google Cloud, driving reliability, performance, and incident response for mission‑critical systems.
About the role
Key Responsibilities
- Architect and develop highly available, scalable services using Python, Go, and Kubernetes on Google Cloud Platform.
- Own end‑to‑end reliability of production systems, including capacity planning, performance tuning, and fault‑tolerance.
- Lead on‑call rotations, diagnosing and resolving incidents, and driving post‑mortem analysis to prevent recurrence.
- Collaborate with cross‑functional teams to define telemetry, monitoring, and alerting strategies that provide actionable insights.
- Mentor junior engineers, review code, and set technical direction for SRE initiatives.
Requirements
- Bachelor’s degree in Computer Science or related field (Master’s preferred).
- 5+ years of software development experience in one or more languages.
- 3+ years designing, analyzing, and troubleshooting large‑scale distributed systems.
- Strong experience with Kubernetes, container orchestration, and cloud infrastructure.
- Proven track record in incident response, telemetry, and risk management.