onsite
Software Development Engineer, FSx for Lustre - Amazon
Software Engineer
Develop and scale high‑performance storage services for FSx for Lustre, focusing on low‑latency, high‑throughput solutions for GPU‑accelerated AI/ML and HPC workloads using C++, Linux, and AWS technologies.
About the role
Key Responsibilities
- Design, implement, and ship core components of the FSx for Lustre service that deliver terabyte‑per‑second throughput and sub‑millisecond latency.
- Build and maintain highly available, distributed storage systems on Linux, optimizing for IOPS, bandwidth, and scalability.
- Collaborate with cross‑functional teams (hardware, networking, security, and product) to integrate new features and improve overall service reliability.
- Develop performance‑critical code in C++ and conduct rigorous profiling, benchmarking, and tuning to meet strict latency and throughput targets.
- Participate in on‑call rotation, troubleshoot production incidents, and drive root‑cause analysis and long‑term fixes.
Requirements
- 5+ years of software development experience, primarily in C++ on Linux platforms.
- Strong understanding of distributed systems concepts, networking protocols, and storage architectures.
- Hands‑on experience with AWS services and building cloud‑native, highly available solutions.
- Proven ability to profile, debug, and optimize performance‑critical code at scale.
- Excellent problem‑solving skills and ability to work effectively in a fast‑paced, collaborative environment.