AI Engineer
Mid‑level AI Engineer focusing on building and optimizing AI‑driven compensation features, leveraging OpenAI API, Vercel deployment, and prompt‑engineering tools such as Langsmith, Langfuse, and Braintrust.
Coverflex
Work changed. Pay didn’t.
Coverflex exists to make compensation work for everyone. Pay is still rigid, fragmented, and hard to feel. We turn compensation into choice - one platform, one card, one app - for benefits, meal allowance, insurance and more.
Our platform is simple for HR and meaningful for employees. We provide choice, smarter compensation tools and empowerment.
⚙️ TL;DR (The Essentials)
Role: AI Engineer Seniority Level: Mid Type: Individual Contributor Languages: English (main) / Portuguese, Spanish or Italian a plus Main Tools:
→ OpenAI API
→ Vercel SDK or similar
→ Langsmith / Braintrust / Langfuse or similar
→ Cursor / Codex / Claude Code
Location: Remote (Europe only) Compensation:
Base Salary: 40K to 60K gross annually
Equity: Yes – VSOPs
Benefits: See below
Contract Type: Permanent
💥 Your Impact
Your role will play a major role in our success because…
With our goals to scale AI, specially for expansion post-series B, we need AI that has some minimum threshold of reliability. Our AI already handles thousands of conversations per week and we expect that to increase significantly. We need dedicated effort in making sure that our AI is accurate, helpful and reliable.
You’ll know you’re successful when, after 90 days, you’ve…
1. Maintain evals to keep close to 100% accuracy and update them as we get new feedback
2. Monitor user feedback to ensure our agents have a high level of accuracy
3. Run new experiments to proactively increase the performance of current agents (test new prompts, models, agent architectures)
4. Keep up to date with AI developments
5. Design new agents
6. Maintain Evals infrastructure to run reliably and at reasonable time and cost in our CI/CD pipeline
How we’ll measure success:
1. Eval coverage exists for all production agents. Evals are updated shortly after new feedback. Accuracy stays above agreed threshold and matches real-world performance.
2. Feedback loop is active - issues are caught and logged. Time to detect accuracy problems decreases. Recurring issues are identified and addressed.
3. Number of experiments ran and documented per month, resulting in improvements shipped to production.
4. New agents go from concept to production. Agents meet quality bar on first iterations (fewer back-and-forth cycles).
⚡ Reality Check - What Makes This Role Hard
Let's be real - here's what makes this role challenging:
Biggest hurdle is the fact that this is a new field. Processes are being created and it requires faster adaptation to new developments in the industry. Also relevant is that we LLMs are inherently non-deterministic whic
Posted June 24, 2026