Software Engineer
Conduct remote testing of an AI agent in Spanish, crafting prompts to provoke harmful or toxic responses and evaluating safety. Apply prompt engineering, NLP, and AI safety techniques to improve content moderation and ensure safe user interactions.
Project Objective
This project aims to test and evaluate one AI agent in the development stage. By asking questions and requests, based on various categories (the training is provided), you will try to force the AI agent to say something harmful, offensive, dangerous or toxic an evaluate it’s responses.
The end goal is to prevent AI agent from sharing any harmful, offensive, dangerous or toxic content to the end users and make it safer.
This role is fully remote, full-time with a duration of several months (depending on the project progress it might be prolonged).
Full online training before the project will be provided.
Responsibilities:
Requirements:
Originally posted on Himalayas
Posted June 22, 2026