onsite
AI Infrastructure & Experience Engineer - DGN Technologies
Software Engineer
Design, deploy, and optimize large language and multimodal models on on‑prem GPU hardware, leveraging CUDA, Python, and deep learning frameworks to maximize inference performance and cost efficiency.
About the role
Key Responsibilities
- Deploy, benchmark, and fine‑tune multiple LLMs and generative multimodal models on local inference servers.
- Optimize inference latency and throughput (TTFT, tokens/sec) using model quantization, caching, and architecture‑specific tweaks.
- Develop custom CUDA kernels and integrate them with PyTorch/TensorRT to fully exploit GPU resources.
- Maintain Linux‑based infrastructure, including driver updates, container orchestration, and monitoring tools.
- Collaborate with data scientists and product teams to translate model requirements into scalable, production‑ready pipelines.
Requirements
- Strong experience with Python and deep learning frameworks such as PyTorch.
- Deep knowledge of CUDA programming, kernel development, and GPU performance profiling.
- Hands‑on expertise in model quantization, TensorRT, and inference optimization techniques.
- Proficiency in Linux system administration and containerization (Docker, Kubernetes).
- Demonstrated ability to deploy and troubleshoot large language models in production environments.
Skills
pythoncudalinuxpytorch