remote
Software Development Engineer II - Intelligent Cloud Hosting - Amazon
Software Engineer
Senior SDE focused on building intelligent, self‑healing systems for Amazon’s global cloud hosting platform, leveraging Python, Go, and AWS to ensure reliability and operational excellence.
About the role
Key Responsibilities
- Design, develop, and maintain scalable services in Python and Go that monitor, diagnose, and remediate incidents across Amazon’s cloud hosting infrastructure.
- Implement automated detection and resolution pipelines using AWS services (CloudWatch, Lambda, Step Functions) to reduce mean time to recovery.
- Collaborate with cross‑functional teams to define reliability metrics, run post‑mortems, and drive continuous improvement.
- Integrate distributed tracing, logging, and alerting to provide end‑to‑end visibility into system health.
- Participate in code reviews, performance tuning, and capacity planning for high‑throughput workloads.
Requirements
- 5+ years of software engineering experience in a cloud‑native environment.
- Proficiency in Python and Go with a strong grasp of concurrency and distributed systems.
- Hands‑on experience with AWS infrastructure, including EC2, ECS/EKS, CloudWatch, and Lambda.
- Solid understanding of DevOps practices, CI/CD pipelines, and automated testing.
- Excellent problem‑solving skills and a proactive mindset for reliability engineering.