onsite
Site Reliability / Operations Engineer - Vantor
Systems Engineer
Senior SRE responsible for designing, deploying, and maintaining highly available cloud infrastructure on AWS, orchestrating Kubernetes clusters, automating CI/CD pipelines, and ensuring robust monitoring and security compliance for mission‑critical spatial intelligence services.
About the role
Key Responsibilities
- Design, implement, and manage scalable, highly available infrastructure on AWS, leveraging Terraform and Kubernetes to support real‑time spatial data processing.
- Build and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools to automate application deployments and infrastructure changes.
- Implement comprehensive monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK stack) to ensure system reliability and rapid incident response.
- Collaborate with development, security, and operations teams to enforce best practices, perform root‑cause analysis, and drive continuous improvement.
- Maintain and audit security controls, ensuring compliance with U.S. Government security clearance requirements (TS/SCI).
Requirements
- 5+ years of experience in site reliability or DevOps roles, with a strong focus on cloud-native technologies.
- Proficiency in AWS services (EC2, RDS, S3, VPC, IAM) and Kubernetes cluster management.
- Hands‑on experience with Terraform, Docker, and CI/CD tooling.
- Solid understanding of monitoring, logging, and incident management practices.
- Active U.S. Government security clearance (TS/SCI) or ability to obtain one.
Skills
kubernetesawsterraformcicd