remote

Director - Hyperscale, HPC & Sovereign AI Deployment and Fleet Operations - AMD

Systems Engineer

Lead the design and execution of hyperscale AI and HPC deployments, driving fleet operations and cloud infrastructure to accelerate next‑generation AI workloads across data centers and edge environments.

About the role

Key Responsibilities

Architect and oversee hyperscale AI and HPC deployment strategies, ensuring high availability, performance, and security across global data centers.
Lead cross‑functional teams in the design, implementation, and optimization of AI infrastructure, including GPU clusters, networking, and storage solutions.
Develop and maintain fleet operations processes, automating provisioning, monitoring, and lifecycle management of AI workloads using Kubernetes and cloud-native tools.
Collaborate with product, research, and security teams to integrate cutting‑edge AI models and ensure compliance with sovereign data regulations.
Drive continuous improvement initiatives, leveraging metrics and analytics to enhance system efficiency, cost‑effectiveness, and scalability.

Requirements

10+ years of experience in large‑scale AI or HPC infrastructure, with a proven track record of leading complex deployments.
Deep expertise in cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, OpenShift).
Strong understanding of GPU architecture, high‑performance networking, and storage technologies.
Excellent leadership, communication, and stakeholder management skills.
Experience with sovereign AI compliance and data residency requirements is a plus.

Skills

kubernetes

CompanyAMD

DepartmentOperations

LocationSanta Clara, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026