remote

AI Evaluation Specialist - Handshake

Software Engineer

AI Evaluation Specialist focused on enhancing domain‑specific LLM performance through prompt design, data annotation, and iterative testing using Python and NLP techniques.

About the role

Key Responsibilities

Design, test, and refine prompts tailored to specific domains to improve LLM accuracy and relevance.
Collect, annotate, and curate high‑quality datasets for model evaluation and training.
Analyze model outputs, identify failure modes, and recommend improvements to prompt strategies.
Collaborate with research teams to integrate findings into ongoing AI projects.
Document methodologies, results, and best practices for internal knowledge sharing.

Requirements

Strong background in NLP and experience working with LLMs such as GPT‑4 or similar.
Proficiency in Python and data manipulation libraries (pandas, NumPy).
Hands‑on experience with prompt engineering and evaluation frameworks.
Excellent analytical skills and attention to detail in data annotation.
Self‑motivated, able to work independently in a remote, asynchronous environment.

Skills

pythonnatural language processing

CompanyHandshake

DepartmentEngineering

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026