onsite
Principal Engineer, Data Infrastructure
Software Engineer
Lead design and implementation of scalable data infrastructure, building robust pipelines with Apache Airflow, Beam, Flink, and Hudi while guiding engineering teams and ensuring high‑performance data delivery.
About the role
Key Responsibilities
- Architect, develop, and maintain large‑scale data pipelines using Apache Airflow, Beam, Flink, and Hudi.
- Define and enforce best practices for data modeling, ingestion, transformation, and storage across distributed systems.
- Collaborate with data scientists, product managers, and platform engineers to translate business requirements into reliable data solutions.
- Mentor senior and staff engineers, fostering a culture of code quality, testing, and continuous improvement.
- Drive performance tuning, capacity planning, and cost optimization for data workloads.
Requirements
- 10+ years of experience building data infrastructure at scale, with deep expertise in Apache Airflow, Beam, Flink, and Hudi.
- Strong programming skills in Python (or Java/Scala) and solid SQL knowledge.
- Proven track record designing distributed, fault‑tolerant systems and optimizing large data volumes.
- Experience leading technical teams, conducting design reviews, and delivering production‑grade solutions.
- Excellent problem‑solving abilities and communication skills to work cross‑functionally.
Skills
apache beamapache flinksqlpython