remote
Site Reliability Engineer - Omni Federal
Site Reliability Engineer
Remote Site Reliability Engineer responsible for designing, automating, and maintaining highly available cloud infrastructure, leveraging AWS, Kubernetes, Terraform, and CI/CD pipelines while ensuring performance, security, and reliability for mission‑critical federal applications.
About the role
Key Responsibilities
- Design, implement, and operate scalable, secure, and highly available cloud environments on AWS.
- Develop and maintain infrastructure-as-code using Terraform and automate deployment pipelines with CI/CD tools.
- Manage container orchestration platforms (Kubernetes) and ensure reliable service delivery through monitoring, alerting, and incident response.
- Collaborate with development and security teams to embed reliability and compliance best practices into the software lifecycle.
- Continuously improve system performance, capacity planning, and cost optimization.
Requirements
- 3+ years of experience in site reliability or DevOps roles, preferably supporting mission‑critical applications.
- Strong proficiency in Python scripting and Linux system administration.
- Hands‑on experience with AWS services, Kubernetes, and Terraform for infrastructure automation.
- Demonstrated ability to build and maintain CI/CD pipelines and implement robust monitoring/alerting solutions.
- Clearance eligibility (Secret) and ability to travel up to 30% of the time.
Skills
pythonlinuxkubernetesterraformawscicd