remote
Staff Site Reliability Engineer, Platform Security - Tesla
Site Reliability Engineer
Lead end‑to‑end Kubernetes security transformation, auditing clusters, hardening RBAC, automating self‑healing, and integrating security into CI/CD pipelines across a multi‑cloud environment.
About the role
Key Responsibilities
- Conduct comprehensive security audits of all Kubernetes clusters, identifying RBAC misconfigurations, over‑privileged service accounts, and insecure network policies.
- Design and implement automated remediation workflows, self‑healing mechanisms, and alerting systems to maintain cluster integrity.
- Collaborate with Platform Engineering, DevOps, and MLOps teams to embed security best practices into CI/CD pipelines and infrastructure as code.
- Lead the development of security tooling and dashboards that provide real‑time visibility into cluster health and compliance.
- Drive continuous improvement of security policies, incident response playbooks, and post‑incident analysis.
Requirements
- 10+ years of SRE experience with a focus on Kubernetes security and hardening.
- Deep expertise in cloud platforms (AWS, Azure, GCP) and multi‑cloud architecture.
- Proficiency in scripting (Python, Bash) and automation frameworks (Terraform, Helm, Argo CD).
- Strong knowledge of CI/CD pipelines, container security, and vulnerability management.
- Excellent communication skills and a proven track record of leading security initiatives in large, distributed environments.