onsite

LLM Engineer LLM Evaluation - 42dot

LLM Engineer

Lead the design and automation of large‑language‑model evaluation pipelines, building benchmark datasets, evaluation protocols, and end‑to‑end workflows on Kubernetes with MLflow and Argo Workflows to continuously improve model quality and reliability.

About the role

Key Responsibilities

Design and maintain LLM evaluation frameworks, including benchmark datasets and evaluation metrics (human and LLM‑based).
Develop and automate end‑to‑end evaluation pipelines on Kubernetes, integrating Argo Workflows and MLflow for experiment tracking and deployment validation.
Establish fair comparison protocols to benchmark multiple LLMs, ensuring reproducibility and consistency across experiments.
Collaborate with research and engineering teams to iterate on model improvements based on evaluation insights.
Monitor and enhance the reliability of the evaluation platform, scaling resources and optimizing performance.

Requirements

Strong experience with Python and data‑engineering tools for large‑scale model evaluation.
Proficiency in Kubernetes, Argo Workflows, and MLflow for orchestrating and tracking experiments.
Hands‑on experience designing benchmark datasets and evaluation metrics for LLMs.
Excellent problem‑solving skills and ability to work in a fast‑moving, research‑driven environment.
Effective communication skills to present findings to cross‑functional teams.

Skills

pythonkubernetesmlflow

Company42dot

DepartmentResearch

LocationPangyo, Korea, Republic of

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026