onsite

Staff Software Engineer, Inference Platform - Cerebras Systems

Software Engineer

Lead the design and implementation of high‑performance inference services for a wafer‑scale AI platform, driving scalability, latency reduction, and seamless integration with cloud ecosystems.

About the role

Key Responsibilities

Architect and develop scalable inference pipelines that leverage the wafer‑scale architecture to deliver sub‑millisecond latency for large‑scale ML models.
Collaborate with hardware and systems teams to optimize GPU and CPU utilization, memory bandwidth, and inter‑node communication.
Implement and maintain production‑grade services in Python and C++, integrating CUDA kernels for accelerated inference.
Design and enforce robust testing, monitoring, and CI/CD pipelines to ensure reliability and performance at scale.
Mentor junior engineers, conduct code reviews, and drive best practices across the inference platform team.

Requirements

10+ years of software engineering experience, with a strong background in high‑performance computing and distributed systems.
Proficiency in Python, C++, and CUDA; experience with ML frameworks (TensorFlow, PyTorch) and model deployment.
Deep understanding of GPU architecture, memory hierarchies, and performance profiling tools.
Hands‑on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes).
Excellent problem‑solving skills, strong communication, and a passion for pushing the limits of AI hardware.

Skills

pythonccudamachine learning

CompanyCerebras Systems

DepartmentEngineering

LocationSunnyvale, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 21, 2026