remote

Lead Cloud Operations & SRE Engineer - UST

Site Reliability Engineer

Lead the Cloud Operations and Site Reliability Engineering function, defining technical strategy, automating infrastructure, and ensuring high‑availability services on AWS using Kubernetes, Terraform, CI/CD pipelines, and advanced monitoring.

About the role

Key Responsibilities

Define and drive the technical roadmap for cloud operations and SRE across multiple production environments.
Architect, implement, and maintain highly available, scalable infrastructure on AWS using Kubernetes, Terraform, and IaC best practices.
Design, build, and optimize CI/CD pipelines to enable rapid, reliable deployments.
Implement comprehensive monitoring, alerting, and incident response processes to achieve SLO/SLA targets.
Mentor and lead a team of engineers, fostering a culture of automation, reliability, and continuous improvement.

Requirements

5+ years of hands‑on experience in cloud operations, site reliability engineering, or related roles.
Deep expertise with AWS services, Kubernetes orchestration, and infrastructure‑as‑code tools such as Terraform.
Proficiency in scripting or programming (e.g., Python) for automation and tooling.
Strong background in CI/CD pipeline creation, monitoring solutions (Prometheus, Grafana, CloudWatch), and incident management.
Demonstrated leadership ability to guide technical teams and influence cross‑functional stakeholders.

Skills

awskubernetesterraformcicdpython

CompanyUST

DepartmentOperations

LocationUnited Kingdom

Experience7+ years

Tenurefull-time

LevelLead

Posted June 23, 2026