remote

AI Engineer - Performance Team - Coverflex

AI Engineer

Mid‑level AI Engineer focusing on building and optimizing AI‑driven compensation features, leveraging OpenAI API, Vercel deployment, and prompt‑engineering tools such as Langsmith, Langfuse, and Braintrust.

About the role

Coverflex

Work changed. Pay didn’t.

Coverflex exists to make compensation work for everyone. Pay is still rigid, fragmented, and hard to feel. We turn compensation into choice - one platform, one card, one app - for benefits, meal allowance, insurance and more.

Our platform is simple for HR and meaningful for employees. We provide choice, smarter compensation tools and empowerment.

⚙️ TL;DR (The Essentials)

Role: AI Engineer Seniority Level: Mid Type: Individual Contributor Languages: English (main) / Portuguese, Spanish or Italian a plus Main Tools:

→ OpenAI API

→ Vercel SDK or similar

→ Langsmith / Braintrust / Langfuse or similar

→ Cursor / Codex / Claude Code

Location: Remote (Europe only) Compensation:

Base Salary: 40K to 60K gross annually

Equity: Yes – VSOPs

Benefits: See below

Contract Type: Permanent

💥 Your Impact

Your role will play a major role in our success because…

With our goals to scale AI, specially for expansion post-series B, we need AI that has some minimum threshold of reliability. Our AI already handles thousands of conversations per week and we expect that to increase significantly. We need dedicated effort in making sure that our AI is accurate, helpful and reliable.

You’ll know you’re successful when, after 90 days, you’ve…

1. Maintain evals to keep close to 100% accuracy and update them as we get new feedback

2. Monitor user feedback to ensure our agents have a high level of accuracy

3. Run new experiments to proactively increase the performance of current agents (test new prompts, models, agent architectures)

4. Keep up to date with AI developments

5. Design new agents

6. Maintain Evals infrastructure to run reliably and at reasonable time and cost in our CI/CD pipeline

How we’ll measure success:

1. Eval coverage exists for all production agents. Evals are updated shortly after new feedback. Accuracy stays above agreed threshold and matches real-world performance.

2. Feedback loop is active - issues are caught and logged. Time to detect accuracy problems decreases. Recurring issues are identified and addressed.

3. Number of experiments ran and documented per month, resulting in improvements shipped to production.

4. New agents go from concept to production. Agents meet quality bar on first iterations (fewer back-and-forth cycles).

⚡ Reality Check - What Makes This Role Hard

Let's be real - here's what makes this role challenging:

Biggest hurdle is the fact that this is a new field. Processes are being created and it requires faster adaptation to new developments in the industry. Also relevant is that we LLMs are inherently non-deterministic whic

About the role

Coverflex

Work changed. Pay didn’t.

Our platform is simple for HR and meaningful for employees. We provide choice, smarter compensation tools and empowerment.

⚙️ TL;DR (The Essentials)

Role: AI Engineer Seniority Level: Mid Type: Individual Contributor Languages: English (main) / Portuguese, Spanish or Italian a plus Main Tools:

→ OpenAI API

→ Vercel SDK or similar

→ Langsmith / Braintrust / Langfuse or similar

→ Cursor / Codex / Claude Code

Location: Remote (Europe only) Compensation:

Base Salary: 40K to 60K gross annually

Equity: Yes – VSOPs

Benefits: See below

Contract Type: Permanent

💥 Your Impact

Your role will play a major role in our success because…

You’ll know you’re successful when, after 90 days, you’ve…

1. Maintain evals to keep close to 100% accuracy and update them as we get new feedback

2. Monitor user feedback to ensure our agents have a high level of accuracy

3. Run new experiments to proactively increase the performance of current agents (test new prompts, models, agent architectures)

4. Keep up to date with AI developments

5. Design new agents

6. Maintain Evals infrastructure to run reliably and at reasonable time and cost in our CI/CD pipeline

How we’ll measure success:

1. Eval coverage exists for all production agents. Evals are updated shortly after new feedback. Accuracy stays above agreed threshold and matches real-world performance.

2. Feedback loop is active - issues are caught and logged. Time to detect accuracy problems decreases. Recurring issues are identified and addressed.

3. Number of experiments ran and documented per month, resulting in improvements shipped to production.

4. New agents go from concept to production. Agents meet quality bar on first iterations (fewer back-and-forth cycles).

⚡ Reality Check - What Makes This Role Hard

Let's be real - here's what makes this role challenging:

AI Engineer - Performance Team - Coverflex

About the role

AI Engineer - Performance Team - Coverflex

About the role

Skills