remote
Site Reliability Engineer - Vynca
Site Reliability Engineer
Site Reliability Engineer driving scalable, resilient infrastructure for a care‑tech platform using Kubernetes, Docker, CI/CD pipelines, AWS, Prometheus monitoring, and Python scripting to ensure high availability and rapid incident response.
About the role
Key Responsibilities
- Design, deploy, and maintain Kubernetes clusters and containerized services for a high‑traffic care platform.
- Implement and manage CI/CD pipelines to automate build, test, and release processes.
- Configure and monitor infrastructure using Prometheus, Grafana, and alerting systems to ensure 99.9% uptime.
- Collaborate with development teams to optimize application performance and reliability.
- Respond to incidents, conduct post‑mortems, and drive continuous improvement of SRE practices.
Requirements
- 3+ years of experience in site reliability or DevOps roles.
- Hands‑on expertise with Kubernetes, Docker, and cloud platforms (AWS preferred).
- Strong scripting skills in Python and Bash for automation.
- Experience with monitoring tools such as Prometheus, Grafana, and alerting frameworks.
- Excellent problem‑solving skills and a proactive, collaborative mindset.
Skills
kubernetesdockercicdawsprometheuspython