onsite
Site Reliability Engineer - Platforms - Toyota North America
Site Reliability Engineer
Lead platform reliability for a global automotive finance service, building scalable, automated infrastructure on AWS, Kubernetes, and Terraform while ensuring high availability, performance, and security.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable platform services on AWS using Kubernetes and Docker containers.
- Automate infrastructure provisioning and configuration with Terraform, ensuring repeatable, versioned deployments.
- Develop and maintain CI/CD pipelines, monitoring, and alerting to detect and remediate incidents quickly.
- Collaborate with development, security, and operations teams to enforce best practices and improve system resilience.
- Analyze performance metrics, conduct capacity planning, and drive continuous improvement initiatives.
Requirements
- 3+ years of experience in site reliability engineering or DevOps roles.
- Proficiency with AWS services (EC2, EKS, RDS, CloudWatch) and container orchestration.
- Hands‑on experience with Terraform, CI/CD tools, and monitoring solutions.
- Strong scripting skills in Python or Bash.
- Excellent problem‑solving, communication, and teamwork abilities.
Skills
kubernetesdockerawsterraform