remote
Site Reliability Engineer - ManTech
Site Reliability Engineer
Drive reliability and scalability for cloud-native services using Kubernetes, Docker, Terraform, and AWS. Automate deployments, monitor performance, and ensure high availability through robust CI/CD pipelines and scripting.
About the role
Key Responsibilities
- Design, deploy, and maintain scalable Kubernetes clusters and Docker containers across AWS environments.
- Implement and manage Terraform scripts for infrastructure as code, ensuring reproducible and auditable deployments.
- Develop and maintain CI/CD pipelines to automate build, test, and release processes.
- Monitor system health, performance, and security using Prometheus, Grafana, and CloudWatch, and respond to incidents.
- Collaborate with development teams to optimize application performance and reliability.
Requirements
- Proven experience with Kubernetes, Docker, and Terraform in production environments.
- Strong scripting skills in Python or Bash for automation.
- Hands‑on experience with AWS services (EKS, EC2, S3, CloudWatch).
- Knowledge of CI/CD tools such as Jenkins, GitLab CI, or ArgoCD.
- Excellent problem‑solving skills and ability to work independently in a remote setting.
Skills
kubernetesdockerterraformawscicdpython