remote

Site Reliability Specialist - Descartes Systems Group

Software Engineer

Lead the design, deployment, and operation of scalable, secure cloud infrastructure for logistics solutions, leveraging Kubernetes, Docker, AWS, Terraform, and Python to ensure high availability and performance.

About the role

Key Responsibilities

Architect, deploy, and maintain highly available Kubernetes clusters on AWS, ensuring zero downtime for critical logistics services.
Implement infrastructure-as-code using Terraform, automating provisioning and configuration across multiple environments.
Develop and maintain CI/CD pipelines with GitHub Actions and Jenkins, integrating automated testing, security scanning, and blue‑green deployments.
Design and enforce observability strategies, including Prometheus, Grafana, and ELK stack, to monitor performance, detect anomalies, and drive proactive incident response.
Collaborate with development teams to optimize application performance, implement best practices for containerization, and enforce security hardening.

Requirements

5+ years of experience in site reliability engineering or DevOps roles, with a strong focus on cloud-native technologies.
Proficient in Kubernetes, Docker, and AWS services (EKS, EC2, S3, CloudWatch).
Hands‑on experience with Terraform, CI/CD tooling, and scripting in Python or Bash.
Deep understanding of monitoring, logging, and alerting best practices.
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

kubernetesdockerawsterraformpython

CompanyDescartes Systems Group

DepartmentEngineering

LocationQC, California, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary70,000

Posted June 21, 2026