remote
Site Reliability Engineer SRE - Bright Vision Technologies
Site Reliability Engineer
Join a remote‑first team as a Site Reliability Engineer, building and maintaining scalable, secure infrastructure on AWS using Kubernetes, Docker, Terraform, and modern CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and operate highly available services on AWS, leveraging Kubernetes and Docker containers.
- Develop and maintain infrastructure‑as‑code using Terraform to ensure reproducible environments.
- Build and improve CI/CD pipelines for automated testing, deployment, and rollback.
- Implement monitoring, alerting, and observability solutions with Prometheus, Grafana, and related tools.
- Collaborate with development teams to improve application reliability, performance, and security.
- Participate in on‑call rotations, incident response, and post‑mortem analysis.
Requirements
- 3+ years of experience in site reliability, DevOps, or cloud engineering roles.
- Strong proficiency in Python for automation and scripting.
- Hands‑on experience with Kubernetes, Docker, and AWS services (EC2, RDS, S3, etc.).
- Proven ability to write and manage Terraform modules and CI/CD pipelines (e.g., GitHub Actions, Jenkins).
- Experience with monitoring and alerting tools such as Prometheus, Grafana, or similar.
Skills
pythonkubernetesdockerawsterraformcicdprometheus