onsite
Senior Manager, Site Reliability Engineering - Oracle
Software Engineer
Lead a high‑performing SRE team to build reliable, secure, and AI‑driven healthcare cloud services, driving automation, performance, and operational excellence for federal customers.
About the role
Key Responsibilities
- Lead and mentor a cross‑functional SRE team focused on reliability, performance, and security of Oracle Health AI services.
- Design and implement automation frameworks to accelerate deployment, monitoring, and incident response.
- Collaborate with product, security, and AI teams to embed resilience and compliance into the software‑defined operating model.
- Drive continuous improvement initiatives, including capacity planning, cost optimization, and incident post‑mortems.
- Establish and enforce best practices for observability, alerting, and service level objectives across the cloud platform.
Requirements
- 10+ years of experience in large‑scale cloud operations with a focus on site reliability.
- Proficiency in cloud platforms (AWS, Azure, or GCP) and container orchestration (Kubernetes).
- Strong background in automation tools (Terraform, Ansible, or similar) and CI/CD pipelines.
- Deep understanding of security principles, compliance, and data protection in healthcare environments.
- Excellent leadership, communication, and stakeholder management skills.
Skills
machine learningkubernetesterraform