remote
Senior Site Reliability Engineer - Release - Alkami Technology
Site Reliability Engineer
Senior Site Reliability Engineer focused on release engineering, building automated deployment pipelines, managing Kubernetes clusters on AWS, and ensuring high availability through infrastructure-as-code and robust monitoring.
About the role
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines for micro‑service releases across multiple environments.
- Manage and scale Kubernetes clusters on AWS, ensuring reliability, security, and performance.
- Develop infrastructure‑as‑code using Terraform to provision and version cloud resources.
- Implement monitoring, alerting, and observability solutions (e.g., Prometheus, Grafana) to detect and resolve incidents quickly.
- Collaborate with development and product teams to define release processes, rollback strategies, and post‑deployment validation.
Requirements
- 5+ years of SRE or DevOps experience in a cloud‑native environment.
- Strong proficiency with Kubernetes, AWS services, and Terraform.
- Hands‑on scripting/automation skills in Python or similar languages.
- Experience building and maintaining CI/CD pipelines (Jenkins, GitLab CI, or similar).
- Solid understanding of monitoring, logging, and incident response best practices.
Skills
kubernetesawsterraformpythoncicd