remote
Site Reliability Engineer - Ivanti
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, deploying, and maintaining scalable, highly available infrastructure using Kubernetes, Docker, Terraform, and AWS, while automating operations with Python and CI/CD pipelines and ensuring robust monitoring and incident response.
About the role
Key Responsibilities
- Design, implement, and manage Kubernetes clusters and containerized workloads across AWS environments.
- Automate infrastructure provisioning and configuration using Terraform and Python scripts.
- Develop and maintain CI/CD pipelines to streamline application deployments and updates.
- Implement comprehensive monitoring, alerting, and logging solutions to ensure system reliability and performance.
- Collaborate with development teams to optimize application architecture for scalability and resilience.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to continuously improve reliability.
Requirements
- 5+ years of experience in site reliability engineering or related roles.
- Proficiency with Kubernetes, Docker, and container orchestration best practices.
- Hands‑on experience with Terraform, AWS services (EC2, EKS, RDS, S3), and cloud networking.
- Strong scripting skills in Python and familiarity with CI/CD tools such as Jenkins, GitLab CI, or ArgoCD.
- Deep understanding of monitoring tools (Prometheus, Grafana, ELK stack) and incident management processes.
Skills
kubernetesdockerterraformawspythoncicd