onsite
HPC Data Storage Engineer - Sabre Systems
Software Engineer
Design, deploy, and optimize high‑performance storage solutions for large‑scale HPC clusters, handling capacity planning, performance monitoring, and policy enforcement to support mission‑critical defense research.
About the role
Key Responsibilities
- Design, configure, and maintain parallel file systems (e.g., Lustre, GPFS) supporting multi‑petabyte HPC workloads.
- Perform capacity planning, quota management, and hardware lifecycle coordination to ensure continuous availability.
- Monitor storage performance, collect metrics, and implement tuning strategies to meet strict I/O throughput requirements.
- Develop and automate reporting dashboards for storage utilization, performance trends, and compliance metrics.
- Collaborate with HPC architects, network engineers, and security teams to integrate storage solutions with compute and networking infrastructure.
Requirements
- 5+ years of experience managing large‑scale storage systems in HPC or research environments.
- Strong knowledge of Linux operating systems and parallel file system technologies such as Lustre or IBM Spectrum Scale (GPFS).
- Proven ability in capacity planning, performance analysis, and implementing storage policies.
- Proficiency in scripting or programming (Python, Bash) for automation and reporting.
- Experience with hardware storage arrays, SSD/NVMe tiers, and high‑speed interconnects (InfiniBand, Ethernet).