onsite
Staff AI Engineer - RBC
AI Engineer
Lead the design and implementation of a real‑time, low‑latency evaluation engine for AI agents, leveraging Python, AWS, and distributed data pipelines to turn high‑cardinality traces into actionable insights for production quality assurance.
About the role
Key Responsibilities
- Architect and build a scalable, low‑latency runtime system that ingests and analyzes high‑cardinality agent traces in real time.
- Design data pipelines and storage solutions on AWS to support continuous evaluation and metric collection.
- Collaborate with ML teams to define evaluation metrics, thresholds, and actionable insights for agent performance.
- Implement monitoring, alerting, and observability for the evaluation engine, ensuring reliability and uptime.
- Drive performance optimization and cost‑efficiency across the data stack.
Requirements
- 10+ years of software engineering experience, with a strong background in distributed systems and real‑time data processing.
- Proficiency in Python and AWS services (Kinesis, Lambda, DynamoDB, S3, Athena).
- Hands‑on experience with machine learning model evaluation and metrics engineering.
- Deep understanding of site reliability engineering principles, including monitoring, alerting, and incident response.
- Excellent communication skills and a proven ability to work cross‑functionally with ML, data, and product teams.
Skills
pythonawsmachine learning