onsite
Senior Solutions Architect, Generative AI
Senior Solutions Architect, Generative AI
NVIDIA is seeking a Senior Solutions Architect with expertise in Generative AI, focusing on training Large Language Models (LLMs) and implementing Retrieval-Augmented Generation (RAG) workflows. This role involves architecting end-to-end AI solutions, collaborating with customers, and providing technical leadership in generative AI technologies using NVIDIA's platforms.
About the role
About the Role
NVIDIA is seeking a dynamic and experienced Generative AI Solution Architect with specialized expertise in training Large Language Models (LLMs) and implementing workflows based on Retrieval-Augmented Generation (RAG). As a key member of our AI Solutions team, you will play a pivotal role in architecting and delivering cutting-edge solutions that leverage the power of NVIDIA's generative AI technologies. This position requires a deep understanding of language models, particularly LLMs, and a strong proficiency in designing and implementing RAG-based workflows.
What You Will Be Doing
- Architect end-to-end generative AI solutions with a focus on LLMs and RAG workflows.
- Collaborate closely with customers to understand their language-related business challenges and design tailored solutions.
- Collaborate with sales and business development teams to support pre-sales activities, including technical presentations and demonstrations of LLM and RAG capabilities.
- Work closely with NVIDIA engineering teams to provide feedback and contribute to the evolution of generative AI technologies.
- Engage directly with customers to understand their language-related requirements and challenges.
- Lead workshops and design sessions to define and refine generative AI solutions focused on LLMs and RAG workflows.
- Lead the training and optimization of Large Language Models using NVIDIA’s hardware and software platforms.
- Implement strategies for efficient and effective training of LLMs to achieve optimal performance.
- Design and implement RAG-based workflows to enhance content generation and information retrieval.
- Work closely with customers to integrate RAG workflows into their applications and systems.
- Stay abreast of the latest developments in language models and generative AI technologies.
- Provide technical leadership and guidance on best practices for training LLMs and implementing RAG-based solutions.
What We Need To See
- Master's or Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
- 5+ years of hands-on experience in a technical role, specifically focusing on generative AI, with a strong emphasis on training Large Language Models (LLMs).
- Proven track record of successfully deploying and optimizing LLM models for inference in production environments.
- In-depth understanding of state-of-the-art language models, including but not limited to GPT-3, BERT, or similar architectures.
- Expertise in training and fine-tuning LLMs using popular frameworks such as TensorFlow, PyTorch, or Hugging Face Transformers.
- Proficiency in model deployment and optimization techniques for efficient inference on various hardware platforms, with a focus on GPUs.
- Strong knowledge of GPU cluster architecture and the ability to leverage parallel processing for accelerated model training and inference.
- Excellent communication and collaboration skills with the ability to articulate complex technical concepts to both technical and non-technical stakeholders.
- Experience leading workshops, training sessions, and presenting technical solutions to diverse audiences.
Ways To Stand Out From The Crowd
- Experience in deploying LLM models in cloud environments (e.g., AWS, Azure, GCP) and on-premises infrastructure.
- Proven ability to optimize LLM models for inference speed, memory efficiency, and resource utilization.
- Familiarity with containerization technologies (e.g., Docker) and orchestration tools (e.g., Kubernetes) for scalable and efficient model deployment.
- Deep understanding of GPU cluster architecture, parallel computing, and distributed computing concepts.
- Hands-on experience with NVIDIA GPU technologies, and GPU cluster management and ability to design and implement scalable and efficient workflows for LLM training and inference on GPU clusters.