remote
Lead AI Data Pipeline Engineer
Lead AI Data Pipeline Engineer
As a Lead AI Data Pipeline Engineer, you will be responsible for designing and implementing robust data pipelines to prepare high-quality training data for AI models. This includes building data curation workflows, designing data quality frameworks, and collaborating with ML engineers on data requirements.
About the role
Responsibilities
- Lead the design and implementation of data pipelines that prepare high-quality training data for AI models.
- Build data curation workflows that transform raw enterprise data into labeled, validated datasets.
- Design data quality frameworks: validation, profiling, anomaly detection, lineage tracking.
- Extend existing anonymized data export pipelines to support AI training workloads.
- Implement synthetic data generation pipelines.
- Design schema mappings across 197+ enterprise tables for feature extraction.
- Collaborate with ML engineers on training data format requirements.
- Establish data catalog and metadata management for AI training artifacts.
Skills
Data PipelinesAi Modelsdata curationdata quality frameworksValidationProfilingAnomaly Detectionlineage trackinganonymized data export pipelinesSynthetic Data Generationschema mappingsfeature extractionML engineersData CatalogMetadata Management