onsite

Senior Site Reliability Engineer - Database Services - Toyota North America

Site Reliability Engineer

Lead the reliability and performance of high‑availability database services, driving automation, monitoring, and incident response across cloud and on‑prem environments using Kubernetes, Prometheus, Grafana, AWS, Terraform, and Python.

About the role

Key Responsibilities

Design, implement, and maintain highly available database clusters (SQL/NoSQL) across hybrid cloud environments.
Develop and maintain CI/CD pipelines, infrastructure as code (Terraform), and automated deployment workflows.
Implement comprehensive monitoring, alerting, and observability using Prometheus, Grafana, and custom dashboards.
Lead incident response, root‑cause analysis, and post‑mortem documentation to improve system resilience.
Collaborate with development, security, and operations teams to enforce best practices and optimize performance.

Requirements

5+ years of SRE or database operations experience in production environments.
Proficiency with Kubernetes, Helm, and container orchestration at scale.
Strong scripting skills in Python and experience with Terraform or similar IaC tools.
Hands‑on experience with AWS services (RDS, Aurora, EC2, EKS) and on‑prem database technologies.
Excellent problem‑solving skills, strong communication, and a proactive, collaborative mindset.

Skills

kubernetesprometheusgrafanaawsterraformpython

CompanyToyota North America

DepartmentEngineering

LocationPlano, Texas, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 26, 2026