remote
Customer Reliability Engineer - Apache Airflow - Astronomer
Software Engineer
Join a fast‑growing data platform team as a Customer Reliability Engineer, ensuring high availability and performance of Apache Airflow deployments on cloud infrastructure using Python, Kubernetes, CI/CD pipelines, and AWS.
About the role
Key Responsibilities
- Design, implement, and maintain reliable, scalable Apache Airflow environments for enterprise customers.
- Develop automation scripts and CI/CD pipelines (GitHub Actions, Jenkins) to streamline deployment, monitoring, and incident response.
- Collaborate with product, support, and engineering teams to troubleshoot performance bottlenecks, security issues, and infrastructure failures.
- Provide on‑call support and lead post‑mortem analyses, driving continuous improvement of reliability processes.
- Guide customers on best practices for Airflow configuration, DAG design, and cloud resource optimization (AWS, GCP).
Requirements
- 3+ years of hands‑on experience with Apache Airflow in production environments.
- Strong programming skills in Python and familiarity with Linux system administration.
- Proficiency with container orchestration (Kubernetes) and cloud platforms (AWS or GCP).
- Experience building and maintaining CI/CD pipelines and infrastructure‑as‑code tools (Terraform, Helm).
- Excellent communication skills and a customer‑focused mindset for translating technical concepts into actionable guidance.
Skills
pythonkubernetescicdawslinux