onsite
PySpark Data Engineer
Data Engineer
Design, develop, and maintain high‑performance data pipelines using PySpark and Apache Spark, focusing on ELT/ETL processes, data modeling, and HL7 healthcare data integration.
About the role
Key Responsibilities
- Design and implement scalable PySpark pipelines for ingesting, transforming, and loading large‑volume datasets.
- Develop and maintain data models that support analytical and operational reporting.
- Build ELT/ETL workflows that integrate HL7 messages and other healthcare data sources.
- Optimize Spark jobs for performance, cost efficiency, and reliability.
- Collaborate with data scientists, analysts, and product teams to ensure data quality and availability.
Requirements
- Strong experience with PySpark and Apache Spark ecosystem.
- Proficiency in data modeling concepts and relational/NoSQL databases.
- Hands‑on experience building ETL/ELT pipelines, preferably with healthcare data formats such as HL7.
- Solid programming skills in Python and SQL.
- Ability to troubleshoot performance issues and implement best practices for data processing at scale.