onsite
Senior Linux System Administrator HPC
Systems Engineer
Senior Linux System Administrator responsible for designing, deploying, and maintaining multi‑vendor HPC clusters, automating operations with Ansible, scripting in Python/Bash, and optimizing performance for mission‑critical scientific workloads.
About the role
Key Responsibilities
- Design, install, and configure Linux‑based HPC clusters across multiple vendor platforms.
- Automate provisioning, configuration management, and routine maintenance using Ansible and custom scripts.
- Monitor system health, troubleshoot hardware/software issues, and perform performance tuning to meet demanding computational workloads.
- Manage storage, networking, and security components to ensure high availability and data integrity.
- Collaborate with scientists and engineers to optimize environments for large‑scale simulations and data analysis.
Requirements
- 5+ years of Linux system administration experience, preferably in HPC or research environments.
- Strong knowledge of HPC architectures, job schedulers (e.g., Slurm, PBS), and multi‑vendor hardware.
- Proficiency with Ansible for configuration management and automation.
- Advanced scripting skills in Python and Bash for tool development and workflow automation.
- Experience with performance analysis, tuning, and troubleshooting of compute, storage, and network subsystems.
Skills
linuxansiblepythonbash