onsite

Senior Product Manager, Conversational AI Chatbot & Agent Quality

OKX is seeking a Senior Product Manager to lead the development and improvement of conversational AI chatbot and agent products. This role focuses on owning the quality systems, data pipelines, and operational platforms for AI agents, with a strong emphasis on driving business results through enhanced resolution rates and customer satisfaction. The ideal candidate will have hands-on experience in knowledge base management, agent evaluation, and designing human-in-the-loop workflows.

About the role

About The Opportunity

We are looking for an execution-focused Product Manager who has built and improved conversational AI products in production — and has business results to prove it. A strong plus is hands-on experience with agent evaluation harnesses or internal agent platform product design: you've defined the systems that test, score, and operate agents at scale, not just shipped the agents themselves.

You work in logs and specs, not just decks. You know what a bad retrieval chunk looks like, you've personally written labeling guidelines, and you can point to a quarter where your work moved resolution rate by double digits.

What We Are Looking For

You have hands-on experience building and operating conversational AI products in production — not just shipping agents, but owning the quality systems, data pipelines, and operational platforms that keep them reliable at scale. Ideal candidates will have background in one or more of the following areas:

Knowledge Base & Data Quality — knowledge base architecture, retrieval quality tuning, content governance, labeling pipelines, annotation guidelines, training data impact tracking, and dataset freshness management
Agent Evaluation & Quality Assurance — evaluation harness design, test case schemas, automated scoring rubrics (correctness, groundedness, tool-use accuracy), LLM-as-judge evaluation, regression testing for non-deterministic systems, and feedback-driven improvement loops
Chatbot Operations & Dialogue Design — SOP-to-agent-flow translation, edge case handling, escalation path design, log-based failure triage, and metrics ownership (resolution rate, fallback rate, per-intent accuracy, CSAT)
Agent Runtime & Observability Platforms — agent runtime product requirements, tool permission models, task configuration interfaces, developer-facing observability dashboards, failure alerting logic, and debugging workflows
Human-in-the-Loop Workflows — low-confidence case routing, reviewer task interface design, correction data capture, and feedback loop integration back into training or knowledge pipelines

Chatbot & Knowledge Base (Core)

Built or rebuilt a knowledge base — defined structure, wrote/reviewed content, fixed retrieval quality, saw metrics improve
Designed SOPs that became agent flows — mapped real business processes, handled edge cases, shipped as working dialogue flows
Owned a labeling pipeline — wrote annotation guidelines, QA'd batches, tracked whether labeled data moved production metrics
Moved a metric that mattered — resolution rate, fallback rate, CSAT — and can explain exactly what changed

Agent Harness & Platform Product (Strong Plus)

Designed an agent evaluation harness: defined test case schemas, scoring rubrics, and spec'd automated evaluation pipelines with engineering
Product-designed an internal agent platform: defined requirements for agent runtime — tool permission models, task configuration interfaces, developer-facing observability dashboards, and failure debugging workflows; owned the roadmap and shipped iteratively
Closed the eval-to-improvement loop: used harness output to prioritize knowledge fixes, prompt revisions, or flow changes — not just reported scores but drove action from them
Designed human-in-the-loop review workflows: low-confidence case routing, reviewer task interfaces, correction data capture and feedback loop back into training or knowledge pipelines

What You’ll Be Doing

Chatbot Operations

Knowledge base ownership: structure, content quality, retrieval coverage, freshness governance
SOP & dialogue flow design: business process → agent flow → edge case handling → escalation paths
Labeling pipeline: annotation specs, annotator QA, training batch impact tracking
Daily quality work: log review, failure triage, weekly knowledge/flow update cadence
Metrics ownership: resolution rate, fallback rate, per-intent accuracy, CSAT

Agent Harness & Platform Product

Define and maintain agent evaluation frameworks: test case design, automated scoring criteria, regression test coverage
Own the quality feedback loop: harness results → prioritized fixes → re-evaluation → production deployment
Partner with engineering to define product requirements for agent runtime: spec observability features, tool call monitoring interfaces, failure alerting logic, and developer-facing debugging tools — own the backlog, not the ops
Design human-in-the-loop workflows: case routing logic, reviewer interfaces, correction data capture
Track agent version performance over time; maintain eval dashboards that teams actually use

What We Look For In You

3–6 years PM experience; minimum 2 years as primary owner of a production chatbot or AI agent product
Quantified business results: can describe baseline metrics, what you did, and outcome in numbers
Hands-on knowledge base, labeling, and conversation analysis experience (not just oversight)
Familiar with at least one chatbot/agent platform (Coze, Dify, Dialogflow, or similar)
Mandarin Chinese fluency required; English proficiency required

Nice-To-Haves

Designed an agent eval harness: written test case specs, defined scoring rubrics (correctness, groundedness, tool-use accuracy), and spec'd the automated evaluation pipeline with engineering
Product-designed an internal agent platform: defined product requirements for agent runtime — tool permission models, task configuration interfaces, developer-facing observability and debugging workflows; owned roadmap and shipped iteratively
Experience with LLM-as-judge evaluation: has used model-based scoring in a harness and understands its blind spots
Familiar with agent observability tooling (LangSmith, Langfuse, or internal equivalents) — to define what the product needs to surface, not to operate them
Experience spec'ing regression testing for non-deterministic systems: knows how to define quality regression detection when LLM outputs vary
Has written product specs for human-in-the-loop workflows: low-confidence case routing, reviewer task interfaces, correction data capture and feedback loop design
Background in customer service, operations, or financial services domain

Senior Product Manager, Conversational AI Chatbot & Agent Quality

About the role

About The Opportunity

What We Are Looking For

Chatbot & Knowledge Base (Core)

Agent Harness & Platform Product (Strong Plus)

What You’ll Be Doing

Chatbot Operations

Agent Harness & Platform Product

What We Look For In You

Nice-To-Haves

Senior Product Manager, Conversational AI Chatbot & Agent Quality

About the role

About The Opportunity

What We Are Looking For

Chatbot & Knowledge Base (Core)

Agent Harness & Platform Product (Strong Plus)

What You’ll Be Doing

Chatbot Operations

Agent Harness & Platform Product

What We Look For In You

Nice-To-Haves

Skills