remote
Associate Site Reliability Engineer - I4DM
Site Reliability Engineer
Associate Site Reliability Engineer responsible for building and maintaining scalable, highly available infrastructure for federal clients using Kubernetes, Docker, AWS, Terraform, and Python. Focus on automation, monitoring, and continuous improvement in a fast‑paced, collaborative environment.
About the role
Key Responsibilities
- Design, deploy, and manage containerized applications on Kubernetes clusters across AWS environments.
- Implement infrastructure as code with Terraform, ensuring repeatable and auditable deployments.
- Automate operational tasks using Python scripts and CI/CD pipelines.
- Monitor system health, performance, and security with Prometheus, Grafana, and CloudWatch.
- Collaborate with development and security teams to troubleshoot incidents and implement post‑mortem improvements.
Requirements
- 1–2 years of experience in site reliability or DevOps roles.
- Hands‑on experience with Terraform and IaC best practices.
- Strong scripting skills in Python and familiarity with CI/CD tools (GitHub Actions, Jenkins).
- Excellent problem‑solving skills and a collaborative mindset.
Skills
kubernetesdockerawsterraformpython