onsite

Site Reliability Engineer - Production Support - Charles Schwab

Site Reliability Engineer

Site Reliability Engineer focused on production support, ensuring high availability of cloud‑native services using AWS, Python, and Bash. Responsibilities include monitoring, incident response, and continuous improvement of reliability practices.

About the role

Key Responsibilities

Maintain and enhance the reliability of production services hosted on AWS, ensuring 99.9% uptime.
Implement and manage monitoring solutions with Prometheus and Grafana, creating dashboards and alerting rules.
Lead incident response, perform root‑cause analysis, and drive post‑mortem documentation.
Automate deployment pipelines and configuration management using Python scripts and CI/CD tools.
Collaborate with development teams to embed reliability best practices into the software development lifecycle.

Requirements

3+ years of experience in site reliability or production support roles.
Strong proficiency with AWS services (EC2, RDS, S3, CloudWatch).
Hands‑on scripting skills in Python and Bash.
Experience with monitoring/alerting tools such as Prometheus, Grafana, or similar.
Excellent problem‑solving skills and ability to work under pressure during incidents.

Skills

awspythonbashprometheusgrafana

CompanyCharles Schwab

DepartmentSupport

LocationSouthlake, TX, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026