remote
Applied Research Scientist, LLM Evaluation & Post-Training - Innodata
Research Engineer
Lead research on large language model evaluation and post‑training, designing experiments, metrics, and data pipelines to ensure trustworthy, high‑quality AI systems using Python, NLP, and advanced ML techniques.
About the role
Key Responsibilities
- Design and execute rigorous evaluation protocols for large language models, including benchmark creation, metric development, and human‑in‑the‑loop assessments.
- Develop and maintain scalable data pipelines and annotation frameworks to support model training, fine‑tuning, and post‑training analysis.
- Collaborate with cross‑functional teams to integrate evaluation insights into model development cycles, ensuring alignment with safety and performance goals.
- Publish research findings, contribute to open‑source tools, and present results at conferences and internal workshops.
- Stay current with state‑of‑the‑art NLP, ML, and generative AI research, translating advances into practical evaluation strategies.
Requirements
- PhD or Master’s in Computer Science, NLP, or related field with strong research background.
- Proficiency in Python, statistical analysis, and experience with large‑scale ML frameworks.
- Deep understanding of NLP evaluation metrics, bias mitigation, and model interpretability.
- Experience building data pipelines and working with large datasets.
- Excellent communication skills and a track record of publishing in top venues.
Skills
pythonmachine learningnatural language processinggenerative ai