onsite

Senior System Administrator - High Performance Computing

Systems Engineer

Senior System Administrator responsible for designing, securing, and maintaining multi‑vendor HPC clusters, storage, and networking, using Linux, Slurm, Ansible, and Python to support mission‑critical scientific workloads.

About the role

Key Responsibilities

Design, deploy, and operate high‑performance computing clusters across heterogeneous vendor platforms.
Implement and maintain job scheduling and resource management using Slurm, ensuring optimal utilization and reliability.
Automate system provisioning, configuration, and patch management with Ansible and custom Python scripts.
Manage high‑throughput storage solutions (e.g., Lustre, GPFS) and ensure data integrity and performance.
Monitor system health, troubleshoot hardware/software issues, and coordinate with vendors for rapid resolution.
Enforce security hardening, access controls, and compliance standards for all HPC infrastructure.

Requirements

5+ years of experience administering Linux‑based HPC environments.
Strong knowledge of cluster scheduling (Slurm) and parallel file systems (Lustre, GPFS).
Proficiency in automation tools such as Ansible and scripting in Python or Bash.
Experience with virtualization/container technologies (KVM, Docker) and high‑speed networking (InfiniBand, Ethernet).
Demonstrated ability to troubleshoot complex hardware and software issues in mission‑critical settings.

Skills

linuxansiblepython

DepartmentEngineering

LocationLehi, Utah, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 25, 2026