remote
Principal Analyst, Data Integration - H1
Software Engineer
Lead end‑to‑end data integration initiatives, architecting scalable ETL pipelines on AWS, leveraging Python, Spark, and SQL to transform and enrich healthcare data for AI‑driven insights.
About the role
Key Responsibilities
- Design, develop, and maintain robust ETL pipelines that ingest, cleanse, and transform large volumes of healthcare data from diverse sources.
- Collaborate with data scientists and product teams to define data models and ensure data quality for AI and analytics workloads.
- Optimize data workflows on AWS services (Glue, Redshift, S3) and Spark clusters for performance and cost efficiency.
- Implement monitoring, logging, and alerting to guarantee pipeline reliability and rapid issue resolution.
- Document data lineage, metadata, and integration specifications for compliance and audit purposes.
Requirements
- 10+ years of experience in data engineering or analytics, with a strong focus on data integration.
- Proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience building ETL pipelines on AWS (Glue, Redshift, S3, Lambda).
- Deep understanding of data warehousing concepts and best practices.
- Excellent communication skills and ability to translate complex technical concepts to non‑technical stakeholders.