remote
Senior Site Reliability Engineer - Autodesk
Site Reliability Engineer
Senior Site Reliability Engineer focused on building and operating reliable, secure, and scalable cloud services for Autodesk GovCloud products using Python, Terraform, Kubernetes, and AWS GovCloud.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, secure, and scalable cloud services in Autodesk GovCloud environments.
- Develop and enforce reliability practices, automation, and engineering standards for production services.
- Collaborate with development teams to integrate CI/CD pipelines, monitoring, and alerting for continuous delivery.
- Lead incident response, root cause analysis, and post‑mortem activities to improve system resilience.
- Drive capacity planning, performance tuning, and cost optimization across cloud resources.
Requirements
- 5+ years of experience in site reliability engineering or related roles in cloud environments.
- Proficiency with AWS GovCloud, Terraform, Kubernetes, and Python scripting.
- Strong knowledge of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, CloudWatch).
- Experience with CI/CD pipelines, automated testing, and deployment strategies.
- Excellent problem‑solving skills and a proactive approach to improving reliability and security.
Skills
pythonterraformkubernetes