remote

Site Reliability Engineer - Aalyria Careers

Site Reliability Engineer

Drive reliability and scalability of cloud-native services using AWS, Kubernetes, and automation tools. Collaborate with development teams to design, deploy, and maintain highly available infrastructure, ensuring performance, security, and continuous improvement.

About the role

Key Responsibilities

Design, implement, and maintain scalable, highly available infrastructure on AWS using Terraform and CloudFormation.
Manage Kubernetes clusters, ensuring optimal resource utilization, rolling updates, and zero-downtime deployments.
Develop and maintain CI/CD pipelines with GitHub Actions, Jenkins, or ArgoCD to automate build, test, and release processes.
Implement robust monitoring, logging, and alerting using Prometheus, Grafana, ELK stack, and CloudWatch.
Collaborate with development teams to troubleshoot production incidents, perform root cause analysis, and drive post‑mortem improvements.
Enforce security best practices, including IAM policies, network segmentation, and vulnerability scanning.

Requirements

3+ years of experience in site reliability or DevOps roles.
Proficient with AWS services (EC2, RDS, S3, VPC, IAM) and infrastructure-as-code tools.
Hands‑on experience with Kubernetes, Docker, and container orchestration.
Strong scripting skills in Python or Bash for automation.
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

awskubernetesdockerterraformpythoncicd

CompanyAalyria Careers

DepartmentEngineering

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026