remote

Principal Data Engineer - AI Program - Mayo Clinic

Data Engineer

Lead the design and implementation of scalable AI data pipelines, leveraging Python, Spark, SQL, and AWS to deliver high‑quality data solutions that power Mayo Clinic’s advanced analytics and machine learning initiatives.

About the role

Key Responsibilities

Architect, develop, and maintain large‑scale data pipelines for AI and machine learning workloads using Python, Apache Spark, and SQL.
Collaborate with data scientists and product teams to translate analytical requirements into robust, production‑ready data solutions.
Design and implement data ingestion, transformation, and storage strategies on AWS (S3, Redshift, Glue, Athena).
Ensure data quality, governance, and security across all data assets, adhering to HIPAA and institutional policies.
Mentor and guide junior engineers, fostering best practices in coding, testing, and documentation.

Requirements

10+ years of experience in data engineering, with a strong focus on AI/ML data pipelines.
Proficiency in Python, Apache Spark, and SQL; experience with AWS data services.
Deep understanding of data modeling, ETL processes, and data lake architecture.
Strong problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
Excellent communication skills and a passion for mentoring others.

Skills

pythonapache sparksqlawsmachine learning

CompanyMayo Clinic

DepartmentEngineering

LocationRochester, Minnesota, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary225,492

Posted June 25, 2026