remote
Data Solutions Engineer - Abaka AI
Software Engineer
Lead end‑to‑end data pipeline development for AI workloads, leveraging Python, SQL, Spark, and AWS services to deliver scalable, high‑quality data solutions that power generative and embodied AI applications.
About the role
Key Responsibilities
- Design, build, and maintain robust data pipelines that ingest, transform, and serve large volumes of structured and unstructured data for AI models.
- Implement data quality checks, monitoring, and alerting to ensure pipeline reliability and performance.
- Collaborate with data scientists and product teams to understand data requirements and translate them into scalable engineering solutions.
- Optimize data workflows using Spark, SQL, and AWS services (S3, Redshift, Glue, Athena) for cost‑effective, high‑throughput processing.
- Document architecture, data schemas, and best practices for future maintenance and onboarding.
Requirements
- 5+ years of experience in data engineering or related roles, with a strong focus on AI/ML data pipelines.
- Hands‑on experience with AWS data services (S3, Redshift, Glue, Athena) and workflow orchestration tools like Airflow.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced, cross‑functional environment.
Skills
pythonsqlawsapache sparkairflow