remote
Senior Data Engineer - Outmarket AI
Data Engineer
Senior Data Engineer building scalable, SOC‑2 compliant data pipelines for an AI‑first insurance platform, leveraging Python, Spark, and AWS to transform complex documents into actionable insights.
About the role
Key Responsibilities
- Design, develop, and maintain large‑scale data pipelines that ingest, transform, and enrich insurance documents and structured data.
- Implement robust ETL workflows using Apache Spark and Airflow, ensuring data quality, lineage, and auditability.
- Collaborate with data scientists and product teams to expose clean, source‑cited datasets for AI model training and business analytics.
- Optimize performance and cost on AWS (S3, Redshift, Glue, EMR) while maintaining SOC 2 Type II compliance.
- Document architecture, data models, and best practices for internal use and external stakeholders.
Requirements
- 5+ years of experience as a Data Engineer in a regulated industry.
- Proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS data services (S3, Redshift, Glue, EMR) and workflow orchestration (Airflow).
- Strong understanding of data modeling, schema design, and data governance.
- Excellent communication skills and a collaborative mindset.
Skills
pythonsqlapache sparkawsairflow