remote

HPC Systems Engineer - UC San Diego

Systems Engineer

Lead the design, deployment, and optimization of high‑performance computing clusters, ensuring robust Linux environments, efficient scheduling with SLURM, and top‑tier performance tuning across hybrid on‑prem and cloud infrastructures.

About the role

Key Responsibilities

Design, install, and maintain HPC clusters, including compute nodes, storage, and networking components.
Configure and optimize SLURM workloads, ensuring efficient job scheduling and resource allocation.
Develop and maintain Python scripts for automation, monitoring, and performance analysis.
Collaborate with researchers to troubleshoot performance bottlenecks and implement tuning strategies.
Integrate hybrid cloud resources (AWS/GCP) to extend capacity and provide elastic compute options.
Document system configurations, procedures, and best practices for internal use.

Requirements

Strong experience with Linux system administration and HPC cluster environments.
Proficiency in SLURM or equivalent workload managers.
Hands‑on scripting skills in Python for automation and data analysis.
Knowledge of HPC networking, high‑speed interconnects, and storage solutions.
Experience with cloud platforms (AWS, GCP) and hybrid deployment models is a plus.

Skills

machine learningpythonbashlinux

CompanyUC San Diego

DepartmentEngineering

LocationSan Diego, California, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary115,000

Posted June 26, 2026