onsite

Lead Systems Engineer HPC - Princeton University

Systems Engineer

Lead the design, deployment, and maintenance of HPC and AI infrastructure, collaborating with researchers and vendors to deliver scalable, high‑performance computing solutions on Linux platforms.

About the role

Key Responsibilities

Design, install, and manage HPC clusters, ensuring optimal performance and reliability for research workloads.
Collaborate with faculty, researchers, and vendors to specify hardware and software requirements for AI and HPC projects.
Configure and maintain cluster software stacks, including MPI, Slurm, and GPU drivers, and implement security and compliance policies.
Monitor system performance, troubleshoot issues, and implement capacity planning and scaling strategies.
Provide technical guidance and training to research staff on HPC best practices and emerging AI technologies.

Requirements

Extensive experience with Linux-based HPC environments and cluster management tools.
Proficiency in GPU computing, MPI, and job scheduling systems such as Slurm.
Strong understanding of networking, storage, and virtualization technologies in a research context.
Excellent communication skills and ability to work collaboratively with interdisciplinary teams.
Experience with AI frameworks (e.g., TensorFlow, PyTorch) is a plus.

Skills

linux

CompanyPrinceton University

DepartmentEngineering

LocationMercer County, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 21, 2026