onsite
Software Engineering Manager-Production Support Operations - Information Technology Senior Management Forum
Engineering Manager
Lead a high‑performing production support team, driving reliability, incident response, and continuous improvement across cloud‑native platforms using automation, monitoring, and SRE best practices.
About the role
Key Responsibilities
- Oversee end‑to‑end production support operations, ensuring uptime and resilience for critical business platforms.
- Lead incident management, root‑cause analysis, and post‑mortem processes to reduce MTTR and prevent recurrence.
- Implement and evolve SRE practices, including automation, monitoring, and capacity planning across cloud environments.
- Collaborate with engineering, security, and product teams to prioritize reliability improvements and feature releases.
- Mentor and develop a cross‑functional team of engineers, fostering a culture of ownership and continuous learning.
Requirements
- 5+ years of experience in production support or SRE roles, with 2+ years in a managerial capacity.
- Proficiency with cloud platforms (AWS, Azure, or GCP) and modern monitoring/alerting tools (Prometheus, Grafana, Datadog).
- Strong scripting skills (Python, Bash) and familiarity with CI/CD pipelines.
- Excellent communication, problem‑solving, and stakeholder‑management abilities.
- Experience driving process improvements and implementing reliability metrics.