onsite
Data Engineer, League Analytics & Infrastructure - Major League Baseball
Data Engineer
Build and scale MLB's cloud‑native data platform, designing robust pipelines, dbt transformations, and Airflow orchestration on Google Cloud to support real‑time analytics for player tracking and baseball operations.
About the role
Key Responsibilities
- Design, develop, and maintain scalable data pipelines that ingest and process millions of pitch‑level events daily.
- Implement and evolve the dbt transformation layer to provide clean, reliable data models for analytics consumers.
- Harden and automate Airflow orchestration, ensuring reliable scheduling, monitoring, and alerting of data workflows.
- Collaborate with data scientists, analysts, and baseball operations stakeholders to translate business needs into technical solutions.
- Apply infrastructure‑as‑code practices to manage GCP resources, improve reliability, and reduce operational overhead.
Requirements
- 3+ years of hands‑on experience building data pipelines and ETL processes in a cloud environment, preferably GCP.
- Proficiency in Python for data engineering tasks and strong SQL skills for data modeling and analysis.
- Experience with dbt for transformation pipelines and Apache Airflow for workflow orchestration.
- Solid understanding of cloud‑native data lakehouse architectures and best practices for data quality, security, and performance.
- Ability to work cross‑functionally, communicate technical concepts clearly, and deliver production‑ready solutions on schedule.