onsite

PySpark Data Engineer

Data Engineer

Design, develop, and maintain high‑performance data pipelines using PySpark and Apache Spark, focusing on ELT/ETL processes, data modeling, and HL7 healthcare data integration.

About the role

Key Responsibilities

Design and implement scalable PySpark pipelines for ingesting, transforming, and loading large‑volume datasets.
Develop and maintain data models that support analytical and operational reporting.
Build ELT/ETL workflows that integrate HL7 messages and other healthcare data sources.
Optimize Spark jobs for performance, cost efficiency, and reliability.
Collaborate with data scientists, analysts, and product teams to ensure data quality and availability.

Requirements

Strong experience with PySpark and Apache Spark ecosystem.
Proficiency in data modeling concepts and relational/NoSQL databases.
Hands‑on experience building ETL/ELT pipelines, preferably with healthcare data formats such as HL7.
Solid programming skills in Python and SQL.
Ability to troubleshoot performance issues and implement best practices for data processing at scale.

Skills

apache spark

DepartmentEngineering

LocationIN-TN-Chennai, Tamil Nadu, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 25, 2026