onsite
Jr./Sr. Site Reliability Engineer - 17live
Site Reliability Engineer
Join a dynamic team as a Jr./Sr. Site Reliability Engineer, driving reliability, automation, and scalability of web services using Kubernetes, Docker, CI/CD pipelines, and cloud platforms like AWS, while monitoring with Prometheus and scripting in Python.
About the role
Key Responsibilities
- Design, implement, and maintain highly available web infrastructure on Kubernetes clusters.
- Automate deployment pipelines using CI/CD tools and scripting languages.
- Monitor system health with Prometheus, Grafana, and alerting frameworks.
- Collaborate with development teams to optimize application performance and reliability.
- Respond to incidents, conduct root‑cause analysis, and implement preventive measures.
Requirements
- Experience with Kubernetes, Docker, and container orchestration.
- Proficiency in CI/CD tooling (GitHub Actions, Jenkins, GitLab CI).
- Strong scripting skills in Python or Bash.
- Hands‑on knowledge of AWS services (EC2, EKS, CloudWatch).
- Familiarity with monitoring tools such as Prometheus and Grafana.
Skills
kubernetesdockercicdawsprometheuspython