remoteonsite
Principle Site Reliability Engineer Remote - Collins Aerospace
Site Reliability Engineer
Principal Site Reliability Engineer leading scalable, resilient aviation services using Kubernetes, Docker, AWS, Terraform, and CI/CD pipelines, ensuring high availability and performance for connected aviation ecosystems.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for aviation messaging and flight operations services.
- Lead automation of deployment pipelines using CI/CD tools and infrastructure-as-code (Terraform, CloudFormation).
- Implement observability, monitoring, and alerting across Kubernetes clusters and cloud resources.
- Collaborate with development teams to enforce best practices for reliability, security, and performance.
- Drive incident response, post‑mortem analysis, and continuous improvement of SRE processes.
Requirements
- 10+ years of experience in site reliability or DevOps roles, with a strong focus on cloud-native technologies.
- Proficiency with Kubernetes, Docker, and AWS services (EKS, EC2, S3, CloudWatch).
- Hands‑on experience with Terraform, CI/CD pipelines, and scripting in Python or Bash.
- Deep understanding of monitoring, logging, and alerting tools (Prometheus, Grafana, ELK).
- Excellent communication skills and a proven ability to mentor and lead cross‑functional teams.
Skills
kubernetesdockerawsterraformcicdpython