onsite
Principal IT Operations Analytics & Problem Management - Metropolitan Transportation Authority
Systems Engineer
Lead enterprise‑wide IT operations analytics and problem management, driving proactive incident resolution, root‑cause analysis, and continuous improvement using advanced monitoring, scripting, and ITIL best practices.
About the role
Key Responsibilities
- Design and implement analytics frameworks to monitor IT operations, identify trends, and predict incidents across the organization.
- Lead problem management initiatives, coordinating cross‑functional teams to conduct root‑cause analysis and develop long‑term solutions.
- Develop and maintain dashboards, reports, and automated alerts using Python, PowerShell, and monitoring platforms.
- Collaborate with ITIL processes to improve incident, change, and configuration management practices.
- Provide executive‑level insights and recommendations to senior leadership on operational performance and risk mitigation.
Requirements
- 10+ years of experience in IT operations, analytics, or problem management within a large enterprise.
- Deep knowledge of ITIL v4, incident and problem management, and continuous improvement methodologies.
- Proficiency in scripting (Python, PowerShell) and data visualization tools.
- Strong analytical, communication, and stakeholder management skills.
- Experience with monitoring and alerting platforms (e.g., Splunk, Datadog, New Relic) is preferred.