remote
Site Reliability Engineer - Digitalxc.com
Site Reliability Engineer
Senior Site Reliability Engineer with 8‑10 years of experience driving Kubernetes cluster management, multi‑cloud deployments, CI/CD pipeline design, and AWS cost‑optimization strategies for high‑availability cloud‑native applications.
About the role
Key Responsibilities
- Design, deploy, and maintain Kubernetes clusters (K3s, EKS, AKS) in production, ensuring scalability, reliability, and performance.
- Lead multi‑cloud application deployments across AWS, Azure, and other providers, applying best practices for high availability and resilience.
- Build and optimize CI/CD pipelines that automate build, test, and release processes for cloud‑native services.
- Implement cost‑optimization measures in AWS, monitoring usage, rightsizing resources, and leveraging reserved instances and spot fleets.
- Collaborate with development teams to troubleshoot performance issues, enforce security policies, and improve observability.
Requirements
- 8–10 years of SRE or DevOps experience with a strong focus on Kubernetes.
- Proven expertise in AWS and Azure cloud environments, including infrastructure as code.
- Hands‑on experience designing CI/CD pipelines using tools such as GitHub Actions, GitLab CI, or Jenkins.
- Deep understanding of cloud‑native architecture, monitoring, and cost‑management practices.
- Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
Skills
kubernetesawsazurecicd