remote

Remote LLM Personal Assistant Evaluation Specialist - 70- 180/hour - 24 Mag

Research Engineer

Remote LLM Personal Assistant Evaluation Specialist role focuses on advanced LLM power users to design realistic prompts, evaluate AI outputs against detailed rubrics, and assess AI-assisted tasks across food, health, productivity, and personal workflow scenarios.

About the role

We are sharing a specialised part-time consulting opportunity for advanced LLM power users experienced in personalized AI workflows, rubric-based evaluation, real-world task assessment, personal productivity systems, and high-context decision support.

This role supports current and upcoming remote consulting opportunities focused on evaluating how AI systems handle personalized, real-world life tasks across food, health, productivity, career, learning, research, planning, and personal workflow scenarios. Selected professionals will create realistic prompts, complete complex AI-assisted tasks, record workflow execution, design or apply detailed rubrics, and evaluate whether AI outputs are useful, personalized, practical, safe, and successful in real-life contexts.

Key Responsibilities

Professionals in this role may contribute to:

Personalized AI Task Evaluation

Create written responses, prompts, and explanations for complex personal-life tasks
Evaluate whether AI outputs are practical, well-reasoned, personalized, realistic, and successful
Identify where outputs succeed, miss context, overreach, provide generic advice, or fail to account for real constraints
Use hands-on LLM experience to assess real-world usefulness across high-context personal workflows

Rubric Design & Quality Assessment

Apply structured rubrics and quality criteria to evaluate AI system performance
Create detailed evaluation rubrics for complex personal tasks and multi-step workflows
Judge outputs against criteria involving usefulness, personalization, reasoning quality, safety, completeness, and success conditions
Write clear, specific, and well-supported feedback explaining evaluation decisions

Real-World Workflow Execution

Execute AI-assisted tasks while recording screens according to project instructions
Review task performance across tools, prompts, reasoning steps, outputs, and final recommendations
Complete research-intensive personal workflows end-to-end within expected turnaround timelines
Maintain careful documentation of task setup, execution, rubric design, and evaluation results

Ideal Profile

Strong candidates may have:

Heavy personal usage of LLM products and AI tools
Experience using AI for multi-step tasks, planning, research, decision-making, personal workflows, or life administration
Familiarity with tools such as ChatGPT, Claude, Gemini, Perplexity, Cursor, Windsurf, Codex, or other AI agents
Strong ability to explain what makes an AI output useful, incomplete, unsafe, unrealistic, generic, or poorly personalized
Extensive rubric experience, including prior rubric design, evaluation, and quality assessment work
Strong wri

About the role

Key Responsibilities

Professionals in this role may contribute to:

Personalized AI Task Evaluation

Create written responses, prompts, and explanations for complex personal-life tasks
Evaluate whether AI outputs are practical, well-reasoned, personalized, realistic, and successful
Identify where outputs succeed, miss context, overreach, provide generic advice, or fail to account for real constraints
Use hands-on LLM experience to assess real-world usefulness across high-context personal workflows

Rubric Design & Quality Assessment

Apply structured rubrics and quality criteria to evaluate AI system performance
Create detailed evaluation rubrics for complex personal tasks and multi-step workflows
Judge outputs against criteria involving usefulness, personalization, reasoning quality, safety, completeness, and success conditions
Write clear, specific, and well-supported feedback explaining evaluation decisions

Real-World Workflow Execution

Execute AI-assisted tasks while recording screens according to project instructions
Review task performance across tools, prompts, reasoning steps, outputs, and final recommendations
Complete research-intensive personal workflows end-to-end within expected turnaround timelines
Maintain careful documentation of task setup, execution, rubric design, and evaluation results

Ideal Profile

Strong candidates may have:

Heavy personal usage of LLM products and AI tools
Experience using AI for multi-step tasks, planning, research, decision-making, personal workflows, or life administration
Familiarity with tools such as ChatGPT, Claude, Gemini, Perplexity, Cursor, Windsurf, Codex, or other AI agents
Strong ability to explain what makes an AI output useful, incomplete, unsafe, unrealistic, generic, or poorly personalized
Extensive rubric experience, including prior rubric design, evaluation, and quality assessment work
Strong wri

Remote LLM Personal Assistant Evaluation Specialist - 70- 180/hour - 24 Mag

About the role

Remote LLM Personal Assistant Evaluation Specialist - 70- 180/hour - 24 Mag

About the role

Skills