onsite

Founding ML Engineer (India)

Crustdata is seeking a Founding ML Engineer to lead the research and engineering of their core intelligence layer, focusing on real-time B2B data for AI agents. This role involves developing and shipping ML models to process, search, match, and enrich hundreds of millions of professional profiles and company records from across the web.

About the role

About the Role

We're building the gateway to the internet for AI agents. Our APIs already power hundreds of customers — and we went from 0 to $7M ARR in our first 12 months. Now we need someone who can push the boundaries of what our ML systems can do.

We're hiring a Founding ML Engineer to own the research and engineering behind our core intelligence layer. Our platform indexes hundreds of millions of professional profiles and company records from across the web. Making that data searchable, matchable, and enriched is an ML problem at its core.

This is not an MLOps role. You will be researching, training, and shipping models - from paper to prototype to production.

What you'll be doing

Own the ML systems that turn messy, multilingual, web-scale data into structured intelligence.
Solve problems like: returning relevant professional profiles across multiple languages for complex search queries (e.g., "RevOps professionals" returning "Head of Revenue Department," "Revenue Operations Manager," and "VP Sales Operations," across English, French, and German).
Automatically resolve different data sources that refer to the same company across millions of records.
Infer org charts, team structures, and reporting relationships from raw people data.
Detect technologies used by a company from unstructured signals scattered across the web.
Classify job changes (promotion, lateral move, demotion, title edit) for millions of transitions.
Map raw job titles to canonical titles, seniority levels, and job functions across various languages and naming conventions.

Who you are

3+ years building and shipping ML models in production — specifically in NLP, information retrieval, or entity resolution.
Strong with transformer architectures — you've trained and fine-tuned encoder models, not just called APIs.
Proficient in building and evaluating retrieval systems, classifiers, and embedding models.
Comfortable with contrastive learning, metric learning, and representation learning.
Experience using LLMs for structured extraction, classification, or data generation at scale.
Strong in Python and PyTorch.
A true grinder — we work very hard.
Founder mentality — someone who wants to be a founder in the future OR was a founder earlier.

Nice to haves

Experience with entity resolution or record linkage at scale.
Built taxonomy or ontology systems over messy real-world data.
Background in multilingual NLP or cross-lingual transfer.
Scaled LLM inference pipelines in production.
Published research or open-source contributions in NLP/IR.
Experience with distributed training on GPU clusters.

About the role

About the Role

This is not an MLOps role. You will be researching, training, and shipping models - from paper to prototype to production.

What you'll be doing

Own the ML systems that turn messy, multilingual, web-scale data into structured intelligence.
Solve problems like: returning relevant professional profiles across multiple languages for complex search queries (e.g., "RevOps professionals" returning "Head of Revenue Department," "Revenue Operations Manager," and "VP Sales Operations," across English, French, and German).
Automatically resolve different data sources that refer to the same company across millions of records.
Infer org charts, team structures, and reporting relationships from raw people data.
Detect technologies used by a company from unstructured signals scattered across the web.
Classify job changes (promotion, lateral move, demotion, title edit) for millions of transitions.
Map raw job titles to canonical titles, seniority levels, and job functions across various languages and naming conventions.

Who you are

3+ years building and shipping ML models in production — specifically in NLP, information retrieval, or entity resolution.
Strong with transformer architectures — you've trained and fine-tuned encoder models, not just called APIs.
Proficient in building and evaluating retrieval systems, classifiers, and embedding models.
Comfortable with contrastive learning, metric learning, and representation learning.
Experience using LLMs for structured extraction, classification, or data generation at scale.
Strong in Python and PyTorch.
A true grinder — we work very hard.
Founder mentality — someone who wants to be a founder in the future OR was a founder earlier.

Nice to haves

Experience with entity resolution or record linkage at scale.
Built taxonomy or ontology systems over messy real-world data.
Background in multilingual NLP or cross-lingual transfer.
Scaled LLM inference pipelines in production.
Published research or open-source contributions in NLP/IR.
Experience with distributed training on GPU clusters.

Founding ML Engineer (India)

About the role

About the Role

What you'll be doing

Who you are

Nice to haves

Founding ML Engineer (India)

About the role

About the Role

What you'll be doing

Who you are

Nice to haves

Skills