onsite
System Administrator I - High Performance Computing
Systems Engineer
Entry‑level System Administrator responsible for deploying, monitoring, and supporting Linux‑based HPC clusters, managing job schedulers, storage, and network resources to ensure reliable, secure, and high‑throughput computing for mission‑critical workloads.
About the role
Key Responsibilities
- Install, configure, and maintain Linux servers and HPC nodes supporting scientific and intelligence workloads.
- Administer job scheduling systems (e.g., Slurm) to optimize resource allocation and job throughput.
- Monitor cluster health, performance metrics, and storage systems; troubleshoot hardware and software issues.
- Develop and maintain automation scripts (Python, Bash) for provisioning, patching, and routine maintenance.
- Implement security controls, user access policies, and compliance measures across the HPC environment.
- Collaborate with engineers and researchers to assess capacity needs and plan infrastructure upgrades.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience.
- Hands‑on experience with Linux system administration and basic networking concepts.
- Familiarity with HPC job schedulers such as Slurm or Grid Engine.
- Proficiency in scripting languages, preferably Python or Bash, for automation tasks.
- Understanding of virtualization, storage technologies, and security best practices in a multi‑user environment.