remote
Software Development Engineer II - Intelligent Cloud Hosting - Amazon.com
Software Engineer
Senior SDE focused on building intelligent, automated systems that detect, diagnose, and resolve incidents across Amazon’s massive cloud hosting infrastructure using Python, AWS, and distributed systems principles.
About the role
Key Responsibilities
- Design, develop, and maintain scalable, high‑availability services that monitor and remediate incidents across hundreds of thousands of cloud services.
- Implement intelligent automation and machine‑learning models to predict and prevent outages, reducing mean time to recovery.
- Collaborate with cross‑functional teams to integrate new features into the global cloud hosting platform, ensuring reliability and performance.
- Participate in on‑call rotations, providing rapid incident response and root‑cause analysis for mission‑critical services.
- Continuously improve monitoring, alerting, and observability tooling to enhance operational excellence.
Requirements
- 3+ years of software engineering experience in a cloud or distributed systems environment.
- Proficiency in Python and AWS services (EC2, Lambda, CloudWatch, S3, etc.).
- Strong understanding of distributed architecture, fault tolerance, and incident response best practices.
- Experience with automation, scripting, and DevOps tooling (CI/CD, Terraform, Docker).
- Excellent problem‑solving skills and a proactive, data‑driven mindset.