remoteonsite
Specialist, AI Site Reliability Engineer & Ops - Core Enterprise Services - Charles Schwab
Site Reliability Engineer
AI Site Reliability Engineer & Ops specialist driving reliability, automation, and performance for core enterprise services using Python, Kubernetes, AWS, Terraform, and CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and maintain highly available AI-driven services on Kubernetes clusters in AWS.
- Automate deployment, scaling, and monitoring using Terraform, CI/CD pipelines, and observability tools.
- Collaborate with data science and product teams to integrate ML models into production workflows.
- Diagnose and resolve performance bottlenecks, ensuring SLAs for core enterprise services.
- Implement security best practices, compliance checks, and incident response procedures.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles.
- Proficiency with Python, Kubernetes, AWS services (EKS, ECS, S3, CloudWatch), and Terraform.
- Strong background in CI/CD tooling (GitHub Actions, Jenkins, ArgoCD) and monitoring (Prometheus, Grafana).
- Experience with AI/ML model deployment and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset.
Skills
pythonkubernetesawsterraformcicd