remote
HPC Systems Engineer - KLA
Systems Engineer
Lead the design, deployment, and optimization of high‑performance computing clusters for semiconductor manufacturing, leveraging Linux, Python, C++, MPI, and GPU technologies to deliver scalable, high‑throughput solutions.
About the role
Key Responsibilities
- Architect, install, and maintain HPC clusters that support wafer and reticle manufacturing workloads.
- Develop and optimize parallel applications using MPI, OpenMP, and CUDA to maximize performance on GPU‑enabled nodes.
- Implement performance monitoring, profiling, and tuning to achieve target throughput and reduce latency.
- Collaborate with software and hardware teams to integrate new accelerators and storage solutions.
- Automate deployment and configuration using Python scripts and configuration management tools.
- Provide technical support and troubleshooting for production HPC environments.
Requirements
- 3+ years of experience in HPC system engineering or a related field.
- Strong proficiency in Linux system administration, Python, and C++ programming.
- Hands‑on experience with MPI, OpenMP, CUDA, and GPU performance optimization.
- Knowledge of cluster schedulers (SLURM, PBS) and performance monitoring tools.
- Excellent problem‑solving skills and ability to work in a fast‑paced, cross‑functional team.