remote

Site Reliability Engineer - By Light Professional IT Services

Site Reliability Engineer

Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud infrastructure using Kubernetes, Docker, AWS, and Terraform, ensuring performance, reliability, and security for mission‑critical applications.

About the role

Key Responsibilities

Design, implement, and manage scalable Kubernetes clusters on AWS, ensuring high availability and fault tolerance.
Automate infrastructure provisioning and configuration using Terraform, maintaining version control and reproducibility.
Develop and maintain CI/CD pipelines for application deployment, monitoring, and rollback strategies.
Implement robust monitoring, logging, and alerting solutions (Prometheus, Grafana, CloudWatch) to detect and resolve incidents proactively.
Collaborate with development teams to optimize application performance, security, and cost efficiency.

Requirements

3+ years of experience in site reliability or DevOps roles with a focus on cloud-native technologies.
Hands‑on expertise with Kubernetes, Docker, and AWS services (EKS, EC2, S3, RDS).
Proficiency in infrastructure-as-code using Terraform or similar tools.
Strong scripting skills (Python, Bash) and experience with CI/CD tools (Jenkins, GitHub Actions).
Excellent problem‑solving abilities and a proactive approach to incident management.

Skills

kubernetesdockerawsterraform

CompanyBy Light Professional IT Services

DepartmentEngineering

LocationOrlando, FL, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026