onsite
ML Kernel Performance Engineer, Edge AI and Science - Amazon.com
Software Engineer
Design and optimize high‑performance ML kernels for a next‑generation edge AI compression platform, leveraging CUDA, C++, and Linux profiling tools to maximize compute efficiency on custom neural accelerator silicon.
About the role
Key Responsibilities
- Develop, benchmark, and tune machine‑learning kernels for a proprietary edge AI compression platform.
- Collaborate with hardware architects to align software optimizations with custom neural accelerator silicon capabilities.
- Implement performance‑critical code in C++/CUDA and create Python tooling for rapid experimentation.
- Use Linux profiling suites (e.g., perf, Nsight) to identify bottlenecks and drive 20‑100x compression efficiency improvements.
- Contribute to cross‑functional design reviews, providing data‑driven recommendations for kernel and system level enhancements.
Requirements
- Strong programming experience in C++ and CUDA, with solid Python scripting skills.
- Deep understanding of performance profiling, low‑level optimization, and memory hierarchy on Linux systems.
- Hands‑on experience with machine‑learning frameworks and edge AI workloads.
- Proven ability to work with hardware teams to co‑design software that exploits custom accelerator features.
- BS/MS in Computer Science, Electrical Engineering, or related field; advanced degree preferred.
Skills
pythonccudalinuxmachine learning