Manager- Site Reliability Engineering
Manager- Site Reliability Engineering position — see original posting for full details.
Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
At Okta, our motto is "Always On" and nowhere do we embrace that more than in Technical Operations. We strive to build the most reliable and performant systems on the planet through the skillful use of automation. If you like to be challenged and have a passion for solving large-scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.
You will work on: ● Mentoring, managing, and leading a team of SRE’s with a broad range of expertise and experience. ● Being an evangelist and advocate for security best practices, leading initiatives and projects to strengthen our security posture for our most critical infrastructure. ● Responding to production incidents, driving us to remediation as quickly as possible and determining how we can prevent them in the future. ● Triaging and troubleshooting complex production issues to ensure reliability and performance. ● Working closely with our stakeholders across the organization to ensure our new capabilities are aligned to our competing constraints of reliability, security, and delivery velocity. ● Partnering directly with recruiting and people ops to hire and retain the best talent in the world. ● Keep sharp eyes on our metrics, including vulnerability scanning and security posture, cloud spend, RPO and RTO, and toil overhead, and ensure our projects are driving our metrics in the right direction. ● Supporting a 24x7 online environment as part of an on-call rotation.
You are an ideal candidate if you: ● Are always willing to go the extra mile: see a problem, fix the problem. ● Are passionate about encouraging the development of engineering peers and leading by example. ● Have experience managing teams running large-scale production Java/Tomcat and containerized services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers. ● Have deep knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
Minimum Required Knowledge, Skills, Abilities, and Qualities: ● 4+ years of experience managing SRE or SWE teams, ideally in a cloud native environment. ● 13+ years Strong leadership, communication, and project management skills. ● Strong security background and knowledge. ● BS In computer science (or equivalent experience).
#LI
Posted June 7, 2026