remote

AI Platform Engineer (Remote)

As an AI Platform Engineer, you will design and evaluate autonomous AI agents across various domains, providing expert human feedback to leading AI organizations. This role involves debugging agent traces, stress testing agents, assessing software architecture, and delivering high-density technical feedback for LLM training.

About the role

Role Overview

Help design and evaluate autonomous AI agents across multiple LLMs, spanning health, education, daily life, and other real-world domains (all coding work). Shape the future of agentic AI systems by providing expert human feedback to leading AI organisations. Help train Large Language Models (LLMs) for complex, multi-step architectural workflows.

Key Responsibilities

AI Agent Evaluation

Write evaluation rubrics with objective pass/fail criteria
Debug agent traces to identify failure patterns
Stress test agents against edge cases, prompt injection, and tool misuse

Technical Assessment

Assess production-grade modular software architecture
Analyse multi-turn system interactions and behaviours
Provide high-density technical feedback for LLM training

Project Workflow

Create an account and upload a resume/ID
Complete the onboarding assessment
Start earning through flexible task assignments

Qualifications

Experience in backend engineering, AI automation, or complex systems integration
Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting)
Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases
Practical experience building for live, non-mocked environments and handling multi-turn system interactions

Preferred (Nice to Have)

Experience integrating agents with live tools such as Supabase, Gmail, and other APIs
Familiarity with persistent state and session-tracking patterns
Experience in identifying privacy leaks, authority escalation, or indirect prompt injection vulnerabilities

About the role

Role Overview

Key Responsibilities

AI Agent Evaluation

Write evaluation rubrics with objective pass/fail criteria
Debug agent traces to identify failure patterns
Stress test agents against edge cases, prompt injection, and tool misuse

Technical Assessment

Assess production-grade modular software architecture
Analyse multi-turn system interactions and behaviours
Provide high-density technical feedback for LLM training

Project Workflow

Create an account and upload a resume/ID
Complete the onboarding assessment
Start earning through flexible task assignments

Qualifications

Experience in backend engineering, AI automation, or complex systems integration
Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting)
Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases
Practical experience building for live, non-mocked environments and handling multi-turn system interactions

Preferred (Nice to Have)

Experience integrating agents with live tools such as Supabase, Gmail, and other APIs
Familiarity with persistent state and session-tracking patterns
Experience in identifying privacy leaks, authority escalation, or indirect prompt injection vulnerabilities

AI Platform Engineer (Remote)

About the role

Role Overview

Key Responsibilities

AI Agent Evaluation

Technical Assessment

Project Workflow

Qualifications

Preferred (Nice to Have)

AI Platform Engineer (Remote)

About the role

Role Overview

Key Responsibilities

AI Agent Evaluation

Technical Assessment

Project Workflow

Qualifications

Preferred (Nice to Have)

Skills