remote
Senior Network Development Engineer - Oracle
Software Engineer
Design and implement high‑performance RDMA cluster networking solutions for AI, machine learning, and HPC workloads, driving architecture innovation and performance optimization in a senior engineering role.
About the role
Key Responsibilities
- Architect, develop, and optimize RDMA‑based cluster networking stacks for AI, ML, and HPC applications.
- Collaborate with hardware and software teams to integrate low‑latency networking solutions into compute platforms.
- Perform deep performance analysis, tuning, and debugging of network paths to meet stringent throughput and latency targets.
- Lead the creation of reference designs, benchmarks, and best‑practice documentation for RDMA clusters.
- Mentor junior engineers and contribute to technical roadmaps that advance the organization’s networking capabilities.
Requirements
- 5+ years of experience in network development, with strong expertise in RDMA technologies (e.g., InfiniBand, RoCE).
- Proficiency in C/C++ development on Linux, including kernel‑level networking and driver interfaces.
- Solid understanding of high‑performance computing architectures and AI/ML workload characteristics.
- Demonstrated ability to profile, troubleshoot, and optimize network performance at scale.
- Excellent problem‑solving skills and ability to work cross‑functionally in a fast‑paced research‑driven environment.