remote
AI Data Engineer - Quantifind
Data Engineer
Experienced data engineer specializing in AI‑driven pipelines, knowledge graph construction, and large‑scale ingestion of structured and unstructured data using Python, Spark, Airflow, and cloud services.
About the role
Key Responsibilities
- Design, build, and maintain high‑throughput data ingestion pipelines that transform raw sources into curated knowledge graphs.
- Develop and orchestrate ETL workflows with Apache Airflow, ensuring reliability, scalability, and observability.
- Implement data processing jobs using Spark and Python to handle both structured and unstructured datasets.
- Integrate streaming data via Kafka and manage storage/compute resources on AWS (S3, Redshift, EMR, etc.).
- Collaborate with product and research teams to define ontologies, data quality standards, and documentation frameworks.
Requirements
- 5+ years of professional experience in data engineering, with a focus on AI/ML‑enabled pipelines.
- Proficiency in Python, SQL, and big‑data technologies such as Spark, Airflow, and Kafka.
- Hands‑on experience building and deploying solutions on AWS cloud services.
- Strong understanding of knowledge graph concepts, ontologies, and data provenance.
- Excellent problem‑solving skills and a curiosity‑driven approach to data quality and value.
Skills
pythonsqlapache sparkkafkaaws