onsite
Site Reliability Engineer SRE / DevOps Engineer - E Space
Site Reliability Engineer
Lead the reliability and automation of a cutting‑edge low‑Earth‑orbit IoT platform, ensuring high availability, scalability, and secure connectivity across space and terrestrial networks using Kubernetes, CI/CD pipelines, and cloud services.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for a low‑Earth‑orbit IoT platform using Kubernetes and cloud services (AWS/GCP).
- Build and manage CI/CD pipelines to automate deployment, testing, and roll‑outs of microservices and edge devices.
- Implement robust monitoring, logging, and alerting solutions to ensure 99.9% uptime and rapid incident response.
- Collaborate with software, security, and operations teams to enforce best practices, automate configuration, and optimize performance.
- Drive continuous improvement initiatives, including cost optimization, disaster recovery planning, and capacity forecasting.
Requirements
- 5+ years of experience in SRE or DevOps roles, preferably in satellite or IoT environments.
- Proficiency with Kubernetes, Helm, and container orchestration at scale.
- Strong scripting skills (Python, Bash) and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Hands‑on experience with cloud platforms (AWS or GCP) and infrastructure‑as‑code (Terraform, CloudFormation).
- Excellent problem‑solving skills, ability to work in a fast‑paced, cross‑functional team, and a passion for reliability engineering.