remoteonsite
Senior Site Reliability Engineer - OneAdvanced
Site Reliability Engineer
Senior SRE Engineer leading the design, automation, and reliability of Amazon EKS‑based platforms, driving infrastructure standardization and CI/CD toolchain optimization to accelerate software delivery.
About the role
Key Responsibilities
- Architect, build, and operate highly available Amazon EKS clusters that support the organization’s CI/CD pipeline and Harness delegate platform.
- Develop and maintain infrastructure‑as‑code solutions to automate provisioning, configuration, and scaling of cloud resources.
- Implement monitoring, alerting, and incident response processes to ensure platform reliability and rapid resolution of production issues.
- Drive platform standardization and best‑practice guidelines across teams to reduce operational overhead and improve security posture.
- Collaborate with development and product teams to optimize toolchains, improve deployment velocity, and embed reliability into the software delivery lifecycle.
Requirements
- 5+ years of hands‑on experience managing Kubernetes environments, preferably with Amazon EKS.
- Strong background in cloud platform engineering, including automation with Terraform, CloudFormation, or similar IaC tools.
- Proven expertise in CI/CD concepts and tools (e.g., Harness, Jenkins, GitLab CI).
- Deep understanding of observability stacks, incident management, and security best practices in cloud native environments.
- Excellent problem‑solving skills and ability to mentor junior engineers.