onsite
Senior Infrastructure Site Reliability Engineer - Charles Schwab
Site Reliability Engineer
Senior Site Reliability Engineer focused on designing, automating, and operating scalable cloud infrastructure using Kubernetes, Terraform, and AWS, while driving reliability through Python scripting, CI/CD pipelines, and advanced monitoring.
About the role
Key Responsibilities
- Design, build, and maintain highly available, scalable infrastructure on AWS supporting critical financial services.
- Develop and manage Kubernetes clusters, including networking, security, and performance tuning.
- Automate provisioning and configuration using Terraform and Python to enable rapid, repeatable deployments.
- Implement and maintain CI/CD pipelines that deliver code and infrastructure changes with zero‑downtime.
- Create observability solutions—metrics, logs, and alerts—to proactively detect and resolve incidents.
- Collaborate with development, security, and product teams to embed reliability best practices throughout the software lifecycle.
Requirements
- 5+ years of experience in site reliability or infrastructure engineering, preferably in a financial or regulated environment.
- Deep expertise with Linux systems, Kubernetes orchestration, and AWS services (EC2, RDS, S3, IAM, etc.).
- Proficiency in infrastructure‑as‑code tools such as Terraform and scripting languages like Python.
- Hands‑on experience building CI/CD pipelines using tools such as Jenkins, GitLab CI, or GitHub Actions.
- Strong background in monitoring, logging, and alerting platforms (Prometheus, Grafana, ELK, CloudWatch) and a track record of improving system reliability.
Skills
linuxkubernetesterraformpythonawscicd