remote
Data Engineer - Caterpillar
Data Engineer
Data Engineer responsible for designing, building, and maintaining scalable data pipelines and lakehouse architecture using Python, SQL, AWS, and Spark to enable advanced analytics and machine learning across the organization.
About the role
Key Responsibilities
- Design, develop, and maintain robust data pipelines that ingest, transform, and store large volumes of structured and unstructured data in cloud data lakes.
- Implement data modeling best practices to support downstream analytics, reporting, and machine learning initiatives.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver high‑quality, reproducible datasets.
- Optimize pipeline performance using Spark, SQL, and AWS services such as S3, Glue, and Redshift.
- Ensure data quality, lineage, and security compliance across all data assets.
Requirements
- 3+ years of experience as a data engineer or similar role.
Skills
pythonsqlawsapache spark