onsite
Staff Software Engineer, Inference Platform - Cerebras Systems
Software Engineer
Lead the design and implementation of high‑performance inference services for a wafer‑scale AI platform, driving scalability, latency reduction, and seamless integration with cloud ecosystems.
About the role
Key Responsibilities
- Architect and develop scalable inference pipelines that leverage the wafer‑scale architecture to deliver sub‑millisecond latency for large‑scale ML models.
- Collaborate with hardware and systems teams to optimize GPU and CPU utilization, memory bandwidth, and inter‑node communication.
- Implement and maintain production‑grade services in Python and C++, integrating CUDA kernels for accelerated inference.
- Design and enforce robust testing, monitoring, and CI/CD pipelines to ensure reliability and performance at scale.
- Mentor junior engineers, conduct code reviews, and drive best practices across the inference platform team.
Requirements
- 10+ years of software engineering experience, with a strong background in high‑performance computing and distributed systems.
- Proficiency in Python, C++, and CUDA; experience with ML frameworks (TensorFlow, PyTorch) and model deployment.
- Deep understanding of GPU architecture, memory hierarchies, and performance profiling tools.
- Hands‑on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes).
- Excellent problem‑solving skills, strong communication, and a passion for pushing the limits of AI hardware.
Skills
pythonccudamachine learning