remoteonsite

Sr Manager, AI Site Reliability Engineer - Core Enterprise Services - Charles Schwab

Site Reliability Engineer

Lead AI Site Reliability Engineering for core enterprise services, driving scalable, resilient infrastructure on AWS with Kubernetes, Docker, and Terraform, while applying Python for automation and monitoring.

About the role

Key Responsibilities

Design, implement, and maintain highly available AI workloads on AWS using Kubernetes and Docker.
Develop and manage CI/CD pipelines with Terraform, ensuring secure and repeatable deployments.
Automate monitoring, alerting, and incident response using Python scripts and cloud-native tools.
Collaborate with data science and product teams to optimize AI model performance and reliability.
Lead capacity planning, cost optimization, and performance tuning for large-scale AI services.

Requirements

10+ years of experience in site reliability engineering with a focus on AI/ML workloads.
Proficiency in AWS services (EKS, ECS, Lambda, CloudWatch) and Kubernetes cluster management.
Strong scripting skills in Python and experience with Terraform or similar IaC tools.
Deep understanding of containerization, CI/CD, and observability best practices.
Excellent communication and leadership skills, with a track record of mentoring teams.

Skills

kubernetesdockerawspythonterraform

CompanyCharles Schwab

DepartmentEngineering

LocationTelangana, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 19, 2026