remote
Systems Development Engineer - AWS Generative AI & ML Servers - Amazon.com
AI Engineer
Design, build, and operate high‑performance AWS cloud services for generative AI, machine‑learning training and inference, delivering continuous price‑performance improvements for large‑scale LLM workloads.
About the role
Key Responsibilities
- Design and implement core AWS services that power generative AI and ML workloads, focusing on performance, scalability, and cost efficiency.
- Develop and maintain server‑side software for AI/ML accelerators, integrating with AWS instance types and hardware stacks.
- Collaborate with hardware, systems, and ML teams to optimize training and inference pipelines for multi‑billion‑parameter models.
- Drive continuous improvement of cloud offerings through performance benchmarking, profiling, and automated testing.
- Participate in the full lifecycle of service delivery, from prototype and proof‑of‑concept to production deployment and operations.
Requirements
- Strong programming experience in Python and C++ for systems‑level development.
- Deep understanding of AWS services, cloud infrastructure, and virtualization technologies.
- Hands‑on experience with machine‑learning frameworks, large‑scale model training, and generative AI concepts.
- Proven ability to work on high‑performance computing (HPC) workloads and optimize for latency, throughput, and cost.
- Excellent problem‑solving skills and ability to collaborate across hardware, software, and research teams.
Skills
awspythoncmachine learninggenerative ai