OpenTalent
Hire AI TalentFor EmployeesTop 3%Jobs
Sign inJoin freeEmployer Login
Join free
OpenTalent

The Cohire for AI engineers — and the hiring partner for the teams building frontier intelligence.

Features

  • AI Job Match
  • Resume AI
  • Application Autofill
  • Cohire

For engineers

  • Browse jobs
  • AI Research roles
  • ML Engineering roles
  • Applied AI roles
  • Early-career track
  • Salary data

Resources

  • Blog
  • Events
  • Interview guides
  • Frontier lab insights

Company

  • About
  • For employees
  • Careers
  • Partners
  • Contact
  • Privacy · Terms
© 2026 Gravity Engineering Services Pvt. Ltd. All rights reserved.hello@opentalent.in
All jobs
onsite

Freelance Agent Evaluation Engineer

Freelance Agent Evaluation Engineer

Freelance Agent Evaluation Engineer position — see original posting for full details.

About the role

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves

We're building a dataset to evaluate AI coding agents — how well a model handles real-world developer tasks. You'll create challenging tasks and evaluation criteria within realistic simulated environments:

  • Build virtual companies following a high-level plan - codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history
  • Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair
  • Design tasks set in isolated environments - emulations of a developer's workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase
  • Write tests that accept all correct solutions and reject incorrect ones - neither too strict (breaking on valid approaches) nor too lenient (passing bad ones)
  • Iterate with an AI agent on tests - verifying they catch real problems, don't miss bad solutions, and don't break on good ones
  • Review code written by agents, analyze why an agent failed or succeeded, and design edge cases and adversarial scenarios
  • Iterate based on feedback from expert QA reviewers who score your work on quality criteria

What this is NOT

  • Not data labeling
  • Not prompt engineering
  • Not writing code from scratch - the agent writes most of the code; you guide and evaluate

A significant part of the work is done together with AI - it's very hard to create tasks that challenge frontier models without using frontier models.

What we look for

This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have:

  • Degree in Computer Science, Software Engineering, or related fields
  • 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience writing tests (functional, integration — not just running them)
  • Docker containers, and familiarity with infrastructure tools (Postgres, Kafka, Redis)
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
  • English proficiency - B2
Freelance Agent Evaluation Engineer | OpenTalent

Skills

pythonjavascripttypescriptfastapireactdocker
Sign Up to Apply
Sign Up to Apply
CompanyMindrift
DepartmentEngineering
LocationSingapore
Experience5+ years
Tenurefull-time
LevelMid-Level

Posted June 7, 2026