remote
Software Development Engineer, Machine Learning Networking Performance - Amazon
ML Engineer
Develop and optimize machine‑learning driven networking performance solutions for AWS infrastructure, leveraging Python, C++, and cloud services to improve latency, throughput, and reliability at global scale.
About the role
Key Responsibilities
- Design, implement, and deploy ML models that predict and enhance network performance across AWS data centers.
- Develop high‑performance C++ and Python code for real‑time telemetry collection, analysis, and automated remediation.
- Collaborate with networking, systems, and data‑science teams to integrate ML solutions into existing AWS services and tooling.
- Build scalable data pipelines and feature stores using AWS services (e.g., S3, Kinesis, SageMaker) to support model training and inference.
- Monitor, evaluate, and continuously improve model accuracy, latency, and resource utilization in production environments.
Requirements
- Bachelor's or higher in Computer Science, Electrical Engineering, or related field with 3+ years of software development experience.
- Strong proficiency in Python and C++ and experience building production‑grade ML systems.
- Deep understanding of networking concepts (TCP/IP, routing, congestion control) and performance metrics.
- Hands‑on experience with AWS services such as EC2, S3, Lambda, and SageMaker.
- Proven ability to work on large‑scale, distributed systems and solve complex, data‑intensive problems.
Skills
pythoncmachine learningaws