remote
Senior HPC Engineer - Viridien
Software Engineer
Senior HPC Engineer leading advanced Linux system administration, cluster design, and performance tuning for next‑generation compute platforms using Python, Docker, and Kubernetes.
About the role
Key Responsibilities
- Design, deploy, and maintain large‑scale HPC clusters on Linux, ensuring high availability and optimal performance.
- Lead performance tuning initiatives, profiling workloads, and implementing resource scheduling strategies.
- Develop and maintain automation scripts in Python and Bash for provisioning, monitoring, and troubleshooting.
- Collaborate with data scientists and application developers to optimize code for parallel execution.
- Integrate container orchestration (Docker, Kubernetes) to streamline application deployment and scaling.
- Provide technical guidance and mentorship to junior engineers and cross‑functional teams.
Requirements
- 5+ years of experience in Linux system administration and HPC cluster management.
- Proficiency in performance analysis tools (e.g., perf, gprof, Intel VTune) and job schedulers (SLURM, PBS).
- Strong scripting skills in Python and Bash; experience with CI/CD pipelines.
- Hands‑on experience with containerization (Docker) and orchestration (Kubernetes) in production environments.
- Excellent problem‑solving abilities and a collaborative mindset.
Skills
pythondockerkubernetes