onsite

Systems Engineer, HPC US & Canada - Mistral AI

Systems Engineer

Lead the design, deployment, and optimization of large‑scale HPC infrastructure across cloud and on‑prem environments, ensuring high availability, performance, and scalability for AI workloads.

About the role

Key Responsibilities

Architect and maintain petabyte‑scale HPC clusters, integrating cloud services and on‑prem resources to support AI research and production workloads.
Develop and automate deployment pipelines using Python, Bash, and configuration management tools for rapid provisioning and scaling.
Collaborate with software and research teams to optimize performance, troubleshoot bottlenecks, and implement best practices for distributed training and inference.
Monitor system health, capacity, and security, implementing proactive measures to ensure reliability and compliance.
Document infrastructure designs, operational procedures, and knowledge base articles for internal use.

Requirements

5+ years of experience in HPC or large‑scale distributed systems engineering.
Proficiency with Linux system administration, Python scripting, and C++ performance tuning.
Hands‑on experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Slurm).
Strong understanding of networking, storage, and security in high‑performance environments.
Excellent problem‑solving skills and a collaborative mindset.

Skills

linuxpythonc

CompanyMistral AI

DepartmentEngineering

LocationMontréal, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026