remote
AI Data Lead
AI Data Lead
As an AI Data Lead, you will own the end-to-end Client & AI Data Vault, from ingestion to AI retrieval, and be responsible for building and scaling vector databases and RAG infrastructure. You will also prototype chunking and embedding strategies and develop parsers for complex documents to design robust data models.
About the role
About the Company
Deeprec.ai is proud to announce our partnership with a leading AI Compliance company. They use a modern tech stack to help companies stay on track with laws and regulations worldwide. This remote AI Data Lead role will enable you to join a forward-thinking engineering culture that embraces AI tools as practical productivity accelerators and innovation enablers.
What You’ll Do
- Own the end-to-end Client & AI Data Vault, from ingestion to AI retrieval.
- Build and scale vector databases and RAG infrastructure in production.
- Prototype chunking and embedding strategies using real client data and AI coding tools.
- Develop parsers for complex documents including PDFs, DOCX, spreadsheets, and scans.
- Design data models connecting client content to regulatory concepts and gap analysis.
- Maintain high standards for data quality, performance, testing, and engineering practices.
Requirements
- Production experience with vector databases (e.g. Qdrant, Pinecone, Weaviate, pgvector), including tuning for performance and recall.
- Experience building chunking and embedding pipelines for complex documents.
- Strong SQL and data modelling skills in production systems.
- Experience extracting data from PDFs, DOCX, and scanned documents (incl. OCR/layout-aware parsing).
- Strong Python plus at least one systems-level language.
- Experience with Azure (preferred) or AWS/GCP, CI/CD, and containers.
- Hands-on experience with RAG or hybrid retrieval systems.
- Effective use of AI coding assistants in development workflows.
- Proven track record of shipping production AI or data systems.
What will make you great
- Experience with multi-tenant data architectures and isolation patterns.
- Experience with Elasticsearch, OpenSearch, or similar search engines.
- Background in NLP, information extraction, or document understanding.
- Experience with Kafka or similar messaging systems.
- Experience in regulated industries with strict audit and versioning requirements.
- Contributions to open-source retrieval, embedding, or parsing tools.
What You’ll Get
- Join a small, high-impact AI team where the data layer is a core product enabler, not backend plumbing.
- Direct access to leadership with fast feedback loops and real influence on architecture.
- AI-first culture that treats tools as productivity multipliers.
- Competitive compensation, benefits, and flexible working.
- Opportunity to build the core data foundation of a fast-scaling compliance intelligence platform.