remote
Senior Site Reliability Engineer AWS - UST
Site Reliability Engineer
Senior Site Reliability Engineer specializing in AWS cloud platforms, automation, and container orchestration. Drives reliability, performance, and scalability through IaC, CI/CD pipelines, and proactive monitoring.
About the role
Key Responsibilities
- Design, implement, and maintain highly available AWS infrastructure using Terraform and CloudFormation.
- Develop and manage CI/CD pipelines (Jenkins, GitLab CI) to automate build, test, and deployment processes.
- Operate and troubleshoot Kubernetes clusters, ensuring optimal performance and resilience.
- Implement monitoring, alerting, and observability solutions (Prometheus, Grafana, CloudWatch) to proactively detect issues.
- Collaborate with development and product teams to embed reliability best practices into the software lifecycle.
Requirements
- 5+ years of experience in site reliability or DevOps engineering, with deep expertise in AWS services.
- Strong proficiency in scripting/automation using Python or Bash.
- Hands‑on experience with container orchestration (Kubernetes) and infrastructure‑as‑code tools (Terraform, CloudFormation).
- Solid understanding of Linux systems, networking, and security best practices.
- Proven track record of building and maintaining CI/CD pipelines and monitoring frameworks.
Skills
awskubernetesterraformpythoncicdlinux