About the Role
At NVIDIA, you will be building GeForce G-Assist — an on-device AI assistant that combines Small Language Models (SLMs), retrieval systems, and hybrid cloud capabilities to deliver responsive, context-aware assistance inside the GeForce ecosystem. You will work closely across engineering and product teams to ensure G-Assist performs reliably in real-world scenarios.
What you'll be doing:
- Together, you will focus on how models behave in production, not just on benchmarks. Evaluate and improve Small Language Models used in GeForce G-Assist, with an emphasis on accuracy, robustness, and conversational reliability. Identify and mitigate conversation and context contamination, including state drift, prompt leakage, and retrieval cross-talk.
- Work with SLM and VLM architectures to support text and multimodal interactions. Collaborate on hybrid architectures that combine local SLMs with cloud-based models. We value engineers who enjoy thinking across the full system—from model behavior to runtime performance.
- Optimize local inference using llama.cpp, including quantization, memory usage, and performance tuning. Read, write, and optimize C/C++ code in performance-critical paths.
- Design and integrate retrieval-augmented generation (RAG) systems that ground responses in system and user context. Support agentic AI workflows, enabling planning, tool use, and multi-step execution.
What we need to see:
- 8+ years of validated experience in system software or a related field, with an M.S. or higher degree in Computer Science, Data Science, Engineering, or a related field (or equivalent experience). We’re looking for teammates who enjoy solving real problems, learning as they go, and collaborating in a tight-knit environment.
- Strong ability to read and write C/C++ code in systems-level or performance-sensitive environments, along with proficiency in Python. Hands-on experience with llama.cpp or similar local inference frameworks.
- Hands-on experience evaluating Small Language Models, including task-based and conversational testing, with an understanding of conversation dynamics, long-context behavior, and contamination challenges.
- Knowledge of SLM and VLM architectures and their trade-offs, experience with retrieval technologies and language-model integration, and familiarity with agentic AI patterns such as tool use and planning.
Ways to stand out from the crowd:
- Experience contributing to language or multimodal models that power user-facing products, features, or workflows.
- A track record of collaborating with product, platform, or systems teams to balance model capability, performance, and user experience.
- Demonstrated ability to translate user needs or feedback into measurable improvements in model behavior or system reliability.