onsite

AI Infrastructure & Experience Engineer - DGN Technologies

Software Engineer

Design, deploy, and optimize large language and multimodal models on on‑prem GPU hardware, leveraging CUDA, Python, and deep learning frameworks to maximize inference performance and cost efficiency.

About the role

Key Responsibilities

Deploy, benchmark, and fine‑tune multiple LLMs and generative multimodal models on local inference servers.
Optimize inference latency and throughput (TTFT, tokens/sec) using model quantization, caching, and architecture‑specific tweaks.
Develop custom CUDA kernels and integrate them with PyTorch/TensorRT to fully exploit GPU resources.
Maintain Linux‑based infrastructure, including driver updates, container orchestration, and monitoring tools.
Collaborate with data scientists and product teams to translate model requirements into scalable, production‑ready pipelines.

Requirements

Strong experience with Python and deep learning frameworks such as PyTorch.
Deep knowledge of CUDA programming, kernel development, and GPU performance profiling.
Hands‑on expertise in model quantization, TensorRT, and inference optimization techniques.
Proficiency in Linux system administration and containerization (Docker, Kubernetes).
Demonstrated ability to deploy and troubleshoot large language models in production environments.

Skills

pythoncudalinuxpytorch

CompanyDGN Technologies

DepartmentEngineering

LocationMountain View, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026