remote
Senior Systems Development Engineer - AWS Generative AI & ML Servers - Amazon Web Services
AI Engineer
Design and operate high‑performance AWS server platforms for generative AI, ML training, and HPC workloads, delivering continuous price‑performance improvements for large language models and next‑generation cloud services.
About the role
Key Responsibilities
- Architect, develop, and launch server hardware and firmware solutions that power AWS generative AI and ML workloads.
- Collaborate with product, software, and data‑science teams to optimize performance, scalability, and cost for large‑scale model training and inference.
- Drive continuous improvement of instance types, integrating the latest CPU, GPU, and accelerator technologies.
- Implement monitoring, automation, and debugging tools to ensure high availability and reliability of AI/ML services.
- Contribute to technical roadmaps, evaluate emerging hardware trends, and prototype innovative solutions for future AWS offerings.
Requirements
- 5+ years of experience in systems development, hardware engineering, or low‑level software for high‑performance compute platforms.
- Strong proficiency in C++ and Python for firmware, driver, and automation development.
- Deep knowledge of Linux operating systems, networking, and performance tuning for AI/ML workloads.
- Hands‑on experience with GPU/accelerator architectures, HPC clusters, and large‑scale distributed training systems.
- Demonstrated ability to work cross‑functionally in a fast‑moving cloud environment and deliver production‑grade solutions.
Skills
awslinuxcpythonmachine learning