remote
Site Reliability Engineer Remote - UST
Site Reliability Engineer
Join a remote SRE team supporting a top‑tier banking client, driving reliability and automation on cloud platforms using Kubernetes, Docker, AWS, Terraform, and Python.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for a major banking application.
- Develop and manage CI/CD pipelines to automate deployment, testing, and release processes.
- Monitor system performance and reliability using Prometheus, Grafana, and alerting tools, responding to incidents promptly.
- Collaborate with development and security teams to embed reliability, observability, and compliance into the software lifecycle.
- Automate infrastructure provisioning and configuration management with Terraform and Python scripts.
- Continuously improve site reliability through capacity planning, performance tuning, and post‑incident reviews.
Requirements
- 3+ years of experience in Site Reliability Engineering or DevOps roles.
- Strong hands‑on expertise with Kubernetes, Docker, and AWS services.
- Proficiency in infrastructure‑as‑code tools such as Terraform and scripting in Python.
- Experience building and maintaining CI/CD pipelines and monitoring solutions (e.g., Prometheus, Grafana).
- Solid understanding of networking, security best practices, and incident management processes.
Skills
kubernetesdockerawsterraformpythoncicdprometheus