onsite
Senior Site Reliability Engineer - GCP - Charles Schwab
Site Reliability Engineer
Lead the design, implementation, and operation of highly available, scalable services on GCP, driving reliability, automation, and performance for mission‑critical applications.
About the role
Key Responsibilities
- Architect and maintain production‑grade infrastructure on Google Cloud Platform, ensuring high availability and scalability.
- Design and implement CI/CD pipelines using Terraform, Cloud Build, and Kubernetes to automate deployments and rollbacks.
- Develop and maintain monitoring, alerting, and incident response workflows with Prometheus, Grafana, and Cloud Monitoring.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
- Lead root‑cause analysis, post‑mortem documentation, and continuous improvement initiatives.
Requirements
- 5+ years of experience in site reliability engineering or DevOps roles.
- Deep expertise with GCP services (Compute Engine, Kubernetes Engine, Cloud Storage, Cloud SQL).
- Proficiency in infrastructure as code (Terraform, Cloud Deployment Manager) and container orchestration (Kubernetes).
- Strong scripting skills in Python or Bash for automation and tooling.
- Hands‑on experience with monitoring, logging, and alerting platforms (Prometheus, Grafana, Cloud Monitoring).
Skills
kubernetesterraformcicdpython