remote
Manager 2, Technical Escalations Engineering - Datadog
Software Engineer
Lead a high‑impact Technical Escalations Engineering team, driving deep troubleshooting, incident response, and customer success across Datadog’s platform using Python, Node.js, AWS, Kubernetes, and Datadog tooling.
About the role
Key Responsibilities
- Lead and mentor a team of senior engineers to resolve complex, high‑priority customer escalations across the Datadog platform.
- Own the full incident lifecycle: triage, root‑cause analysis, resolution, and post‑mortem documentation.
- Collaborate with product, reliability, and support teams to drive feature improvements and preventive measures.
- Develop and maintain automation scripts and tooling in Python and Node.js to accelerate diagnostics and remediation.
- Engage directly with enterprise customers, delivering technical guidance and ensuring a superior support experience.
Requirements
- 5+ years of experience in a technical support or engineering role, with a strong background in cloud (AWS) and container orchestration (Kubernetes).
- Proficiency in Python and Node.js for scripting and automation.
- Deep understanding of distributed systems, monitoring, and observability concepts.
- Excellent communication skills and a customer‑centric mindset.
- Experience with incident management tools and post‑mortem processes.
Skills
pythonnodejsawskubernetesdatadog