onsite

Staff Site Reliability Engineer - UKG

Site Reliability Engineer

Lead the design, implementation, and operation of highly available, scalable cloud services using Kubernetes, Docker, and AWS, while driving automation, observability, and incident response excellence.

About the role

Key Responsibilities

Architect, deploy, and maintain production‑grade Kubernetes clusters and containerized workloads across AWS environments.
Implement CI/CD pipelines, infrastructure as code (Terraform), and automated configuration management to accelerate delivery and reduce toil.
Design and enforce robust monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to ensure high availability and rapid incident resolution.
Lead incident investigations, post‑mortems, and continuous improvement initiatives to enhance system reliability and resilience.
Collaborate with development, security, and product teams to embed SRE best practices into the software development lifecycle.

Requirements

10+ years of experience in production site reliability or DevOps roles, with a strong focus on cloud-native technologies.
Deep expertise in Kubernetes, Docker, and AWS services (EKS, EC2, S3, CloudWatch).
Proficient with infrastructure as code tools such as Terraform and configuration management (Ansible, Chef).
Hands‑on experience with monitoring, alerting, and log aggregation platforms (Prometheus, Grafana, ELK).
Strong analytical, problem‑solving, and communication skills, with a proven track record of driving reliability improvements.

Skills

kubernetesdockerawsterraform

CompanyUKG

DepartmentEngineering

LocationLowell, Massachusetts, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary186,100

Posted June 22, 2026