remoteonsite
AI Benchmarking Specialist - Amazon.com
Software Engineer
Lead AI benchmarking for Gen‑AI/LLM tools, designing tests, analyzing model quality, compliance, robustness, and fairness to drive seller growth on a global platform.
About the role
Key Responsibilities
- Design and execute comprehensive benchmarking and audit protocols for Gen‑AI and LLM solutions used by international sellers.
- Collect, annotate, and analyze data to evaluate model performance, compliance, robustness, and fairness.
- Collaborate with cross‑functional teams to refine AI tools, ensuring they meet business and regulatory standards.
- Document findings, create detailed reports, and present actionable insights to stakeholders.
- Continuously improve benchmarking frameworks and tools based on emerging AI research and industry best practices.
Requirements
- Strong background in Machine Learning and experience with Large Language Models.
- Ability to work independently and collaborate across global teams.
Skills
pythonmachine learning