remote
Senior Site Reliability Engineer Remote Build
Site Reliability Engineer
Senior Site Reliability Engineer leading scalable, secure cloud infrastructure for a global remote‑employment platform, driving automation, observability, and high‑availability across Kubernetes, Docker, AWS, and Terraform environments.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Terraform and Kubernetes.
- Automate deployment pipelines with CI/CD tools, ensuring rapid, reliable releases.
- Implement monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to maintain service health.
- Collaborate with development teams to optimize application performance and resilience.
- Lead incident response, root‑cause analysis, and post‑mortem documentation.
Requirements
- 5+ years of SRE or DevOps experience in a cloud‑native environment.
- Proficiency with Kubernetes, Docker, and AWS services (EKS, EC2, S3, RDS).
- Strong scripting skills in Python or Bash and experience with Terraform.
- Hands‑on experience with monitoring tools (Prometheus, Grafana) and log aggregation.
- Excellent problem‑solving skills and a proactive, collaborative mindset.
Skills
kubernetesdockerawsterraform