onsite
Staff Site Reliability Developer, Google Unified Security and Threat Operations
Systems Engineer
Lead the design, deployment, and operation of large‑scale, fault‑tolerant systems for unified security and threat operations, leveraging Python, Go, Kubernetes, and Google Cloud to ensure reliability, scalability, and rapid incident response.
About the role
Key Responsibilities
- Architect, build, and maintain highly available, distributed services that support security and threat detection at scale.
- Implement automated monitoring, alerting, and incident response workflows using Google Cloud operations and observability tools.
- Collaborate with security, product, and infrastructure teams to define reliability SLAs and improve system resilience.
- Drive continuous improvement of deployment pipelines, configuration management, and chaos engineering practices.
- Mentor junior SREs and share best practices across the organization.
Requirements
- Bachelor’s degree in Computer Science or related field (Master’s preferred).
- 8+ years of software development experience in languages such as Python or Go.
- 3+ years of hands‑on experience designing, analyzing, and troubleshooting distributed systems.
- Proficiency with Kubernetes, container orchestration, and Google Cloud Platform services.
- Strong understanding of monitoring, logging, and incident management best practices.