onsite
Software Engineer, Inference Platform - Cerebras Systems
Software Engineer
Develop high‑performance inference software for a wafer‑scale AI chip, leveraging Python, C++, and CUDA to optimize distributed workloads and accelerate large‑scale ML applications.
About the role
Key Responsibilities
- Design, implement, and maintain inference pipelines that run on a wafer‑scale AI chip, ensuring optimal utilization of the device’s compute resources.
- Collaborate with hardware and systems teams to translate architectural specifications into efficient, scalable software solutions.
- Optimize performance-critical code using CUDA, C++, and low‑level profiling tools, targeting latency and throughput improvements.
- Integrate machine‑learning frameworks (e.g., TensorFlow, PyTorch) with the inference platform, providing seamless deployment for end users.
- Develop automated testing, continuous integration, and deployment pipelines to guarantee reliability and rapid iteration.
Requirements
- Strong experience in C++ and CUDA for high‑performance computing.
- Proficiency in Python and familiarity with ML frameworks.
- Solid understanding of distributed systems and parallel programming concepts.
- Experience with profiling, debugging, and performance tuning on GPU architectures.
- Excellent problem‑solving skills and a collaborative mindset.
Skills
pythonccudamachine learning