remote

Senior AI Data Engineer - TechBiz Global

Data Engineer

Senior AI Data Engineer responsible for designing, building, and scaling ETL/ELT pipelines for AI workloads, transforming unstructured data into vectorized formats for LLM consumption, and automating the data-to-model lifecycle on AWS.

About the role

At TechBiz Global , we are providing recruitment service to our TOP clients from our portfolio.

We are currently looking for a dedicated Senior AI Data Engineer to join one of our clients' teams . If you're looking for an exciting opportunity to grow in an innovative environment, this could be the perfect fit for you.

Responsibilities:

▪ Design, build, and scale robust ETL/ELT pipelines optimized for AI workloads, including RAG, fine-tuning, and batch inference.

▪ Transform unstructured data sources such as PDFs, logs, and transcripts into structured and vectorized formats suitable for LLM consumption.

▪ Maintain and automate the data-to-model lifecycle, ensuring AI knowledge bases remain synchronized with changing business data.

▪ Develop and maintain real-time feature pipelines that support low-latency AI and machine learning applications.

▪ Integrate data platforms with Kafka and other event-driven systems to enable real-time processing and AI-driven responses.

▪ Manage and optimize Feature Stores to ensure consistency between model training and production environments.

▪ Implement automated data quality controls and validation processes to ensure the reliability and accuracy of AI training and inference data.

▪ Establish and maintain data lineage frameworks to provide traceability, auditability, and regulatory compliance across data workflows.

▪ Enforce data security, privacy, and governance standards, including PII protection and compliance with industry regulations.

▪ Manage data movement and synchronization across on-premises systems, cloud platforms, and data warehouses.

▪ Optimize data storage and retrieval strategies for Vector Databases to support high-performance RAG and AI search workloads.

▪ Collaborate with Data Scientists, ML Engineers, Software Engineers, and business stakeholders to deliver scalable AI data solutions.

Requirements

▪ 10+ years of experience in Data Engineering or Backend Engineering with a strong focus on data platforms and pipelines.

▪ 2+ years of hands-on experience supporting AI/ML data pipelines, including data preparation for machine learning and generative AI applications.

▪ Expert-level proficiency in Python and SQL; experience with Java or Scala is an advantage.

▪ Strong experience building and maintaining real-time data streaming solutions using Apache Kafka, Flink, or Spark Streaming.

▪ Hands-on experience with modern data orchestration and transformation tools such as Airflow, dbt, and Prefect.

▪ Experience working with Vector Databases and Feature Stores to support AI and machine learning workloads.

▪ Strong knowledge of cloud-based data services on AWS, Azure, or GCP, including services such as Glue, Kinesis, Data Factory, or Dataflow.

▪ Experience deploying and managing data workloads in Kubernetes (K8s) environments.

▪ P