remote
Senior Site Reliability Engineer Remote Build - Jobgether
Site Reliability Engineer
Lead the design and operation of a highly available, scalable platform that powers AI‑driven services across global employment infrastructure, using Kubernetes, Docker, CI/CD pipelines, and cloud observability tools.
About the role
Key Responsibilities
- Architect, deploy, and maintain a resilient Kubernetes‑based platform that supports AI‑driven services across multiple regions.
- Design and implement CI/CD pipelines with GitOps principles, ensuring rapid, reliable releases.
- Monitor system health using Prometheus, Grafana, and custom alerts; conduct post‑incident reviews and root‑cause analysis.
- Automate infrastructure provisioning and configuration with Terraform and cloud‑native tools.
- Collaborate with security teams to enforce compliance and secure deployment pipelines.
- Mentor junior engineers and drive continuous improvement of SRE practices.
Requirements
- 5+ years of experience in site reliability engineering or DevOps roles.
- Deep expertise with Kubernetes, Docker, and container orchestration.
- Proficient in CI/CD tooling (GitHub Actions, GitLab CI, ArgoCD) and GitOps workflows.
- Hands‑on experience with AWS services (EKS, ECS, CloudWatch, IAM).
- Strong scripting skills (Python, Bash) and infrastructure as code (Terraform, CloudFormation).
Skills
kubernetesdockercicdawsprometheusgrafanaterraform