remote
AI Evaluation Specialist - Handshake
Software Engineer
AI Evaluation Specialist focused on enhancing domain‑specific LLM performance through prompt design, data annotation, and iterative testing using Python and NLP techniques.
About the role
Key Responsibilities
- Design, test, and refine prompts tailored to specific domains to improve LLM accuracy and relevance.
- Collect, annotate, and curate high‑quality datasets for model evaluation and training.
- Analyze model outputs, identify failure modes, and recommend improvements to prompt strategies.
- Collaborate with research teams to integrate findings into ongoing AI projects.
- Document methodologies, results, and best practices for internal knowledge sharing.
Requirements
- Strong background in NLP and experience working with LLMs such as GPT‑4 or similar.
- Proficiency in Python and data manipulation libraries (pandas, NumPy).
- Hands‑on experience with prompt engineering and evaluation frameworks.
- Excellent analytical skills and attention to detail in data annotation.
- Self‑motivated, able to work independently in a remote, asynchronous environment.
Skills
pythonnatural language processing