remote

Applied Research Scientist, LLM Evaluation & Post-Training - Innodata

Research Engineer

Lead research on large language model evaluation and post‑training, designing experiments, metrics, and data pipelines to ensure trustworthy, high‑quality AI systems using Python, NLP, and advanced ML techniques.

About the role

Key Responsibilities

Design and execute rigorous evaluation protocols for large language models, including benchmark creation, metric development, and human‑in‑the‑loop assessments.
Develop and maintain scalable data pipelines and annotation frameworks to support model training, fine‑tuning, and post‑training analysis.
Collaborate with cross‑functional teams to integrate evaluation insights into model development cycles, ensuring alignment with safety and performance goals.
Publish research findings, contribute to open‑source tools, and present results at conferences and internal workshops.
Stay current with state‑of‑the‑art NLP, ML, and generative AI research, translating advances into practical evaluation strategies.

Requirements

PhD or Master’s in Computer Science, NLP, or related field with strong research background.
Proficiency in Python, statistical analysis, and experience with large‑scale ML frameworks.
Deep understanding of NLP evaluation metrics, bias mitigation, and model interpretability.
Experience building data pipelines and working with large datasets.
Excellent communication skills and a track record of publishing in top venues.

Skills

pythonmachine learningnatural language processinggenerative ai

CompanyInnodata

DepartmentResearch

LocationRidgefield Park, NJ, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary225,000

Posted June 20, 2026