remote

Senior Platform/Site Reliability Engineer - Lemon.io

Site Reliability Engineer

Senior Platform/Site Reliability Engineer responsible for designing, deploying, and maintaining highly available, scalable infrastructure using Kubernetes, Docker, AWS, Terraform, and CI/CD pipelines, while ensuring robust monitoring and incident response for a remote, high‑growth startup ecosystem.

About the role

Key Responsibilities

Design, implement, and manage scalable, highly available Kubernetes clusters across AWS environments.
Automate infrastructure provisioning and configuration using Terraform and CI/CD pipelines.
Develop and maintain Docker images, Helm charts, and deployment scripts for microservices.
Implement comprehensive monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK).
Lead incident response, root‑cause analysis, and post‑mortem documentation to improve reliability.
Collaborate with development teams to enforce best practices for performance, security, and cost optimization.

Requirements

5+ years of experience in platform or site reliability engineering.
Proficient with Kubernetes, Docker, and cloud-native tooling.
Hands‑on experience with AWS services (EKS, EC2, S3, RDS).
Strong scripting skills in Bash, Python, or Go.
Experience with Terraform, Helm, and CI/CD tools (GitHub Actions, Jenkins, ArgoCD).

Skills

kubernetesdockerawsterraformcicd

CompanyLemon.io

DepartmentEngineering

LocationUnited States

Experience4+ years

Tenurefull-time

LevelSenior

Posted June 18, 2026