onsite

Senior HPC SYSTEMS administrator - University of Oxford

Systems Engineer

Lead the design, deployment, and optimisation of HPC infrastructure for AI and computer vision research, managing Linux clusters, GPU resources, and cloud integration.

About the role

Key Responsibilities

Design, deploy, and maintain Linux‑based HPC clusters, ensuring high availability and performance for AI workloads.
Configure and optimise GPU resources, including NVIDIA CUDA, for deep learning and computer vision pipelines.
Implement and manage workload scheduling with Slurm, tailoring policies for research groups.
Integrate on‑premise infrastructure with AWS services (ECS, S3, EC2) to support hybrid cloud workflows.
Develop automation scripts (Python, Bash) for routine administration and monitoring.
Collaborate with researchers to understand computational needs and provide technical guidance.

Requirements

5+ years of experience administering HPC or large‑scale Linux systems.
Strong knowledge of GPU computing, CUDA, and deep learning frameworks.
Proficiency with Slurm, Ansible, and cloud platforms (AWS preferred).
Excellent scripting skills in Python or Bash.
Effective communication and teamwork in an academic research environment.

Skills

aws

CompanyUniversity of Oxford

DepartmentEngineering

LocationOxford, ENG, United Kingdom

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 19, 2026