remote
Principal Data Engineer - AI Program - Mayo Clinic
Data Engineer
Lead the design and implementation of scalable AI data pipelines, leveraging Python, Spark, SQL, and AWS to deliver high‑quality data solutions that power Mayo Clinic’s advanced analytics and machine learning initiatives.
About the role
Key Responsibilities
- Architect, develop, and maintain large‑scale data pipelines for AI and machine learning workloads using Python, Apache Spark, and SQL.
- Collaborate with data scientists and product teams to translate analytical requirements into robust, production‑ready data solutions.
- Design and implement data ingestion, transformation, and storage strategies on AWS (S3, Redshift, Glue, Athena).
- Ensure data quality, governance, and security across all data assets, adhering to HIPAA and institutional policies.
- Mentor and guide junior engineers, fostering best practices in coding, testing, and documentation.
Requirements
- 10+ years of experience in data engineering, with a strong focus on AI/ML data pipelines.
- Proficiency in Python, Apache Spark, and SQL; experience with AWS data services.
- Deep understanding of data modeling, ETL processes, and data lake architecture.
- Strong problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
- Excellent communication skills and a passion for mentoring others.
Skills
pythonapache sparksqlawsmachine learning