remote

Staff Engineer, Site Reliability - BABYLIST

Software Engineer

Lead the engineering of highly available, scalable infrastructure for a fast‑growing consumer platform, driving automation, observability, and reliability across AWS, Kubernetes, and cloud-native tooling.

About the role

Key Responsibilities

Design, build, and maintain production‑grade infrastructure for a high‑traffic consumer platform using AWS, Kubernetes, and Terraform.
Implement and evolve CI/CD pipelines, ensuring rapid, reliable deployments with zero‑downtime.
Develop and maintain observability stack (metrics, logs, traces) to detect, diagnose, and remediate incidents proactively.
Collaborate with cross‑functional teams to define SLOs, SLIs, and incident response processes.
Mentor and guide junior engineers on best practices in reliability, automation, and cloud architecture.

Requirements

5+ years of experience in site reliability or DevOps roles at high‑scale SaaS or e‑commerce companies.
Proficiency with AWS services (EC2, ECS/EKS, RDS, S3, CloudWatch) and Kubernetes cluster management.
Strong scripting skills in Python or Go, and infrastructure-as-code experience with Terraform.
Hands‑on experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD) and monitoring/alerting platforms (Prometheus, Grafana, Datadog).
Excellent problem‑solving, communication, and collaboration skills in a fast‑moving, remote environment.

Skills

kubernetesawspythongoterraformcicd

CompanyBABYLIST

DepartmentEngineering

LocationUnited States

Experience7+ years

Tenurefull-time

LevelLead

Salary271,991

Posted June 19, 2026