remote
Mainframe Site Reliability Engineer SRE - Kyndryl
Site Reliability Engineer
Lead reliability and resilience for mission‑critical mainframe environments, leveraging z/OS, CICS, and DB2 expertise while automating operations, monitoring performance, and driving continuous improvement.
About the role
Key Responsibilities
- Design, implement, and maintain high‑availability solutions for z/OS, CICS, and DB2 systems.
- Develop and maintain automation scripts (Bash, Python, or similar) to streamline deployment, configuration, and monitoring tasks.
- Collaborate with development and operations teams to define SLAs, capacity planning, and performance tuning strategies.
- Implement proactive monitoring, alerting, and incident response workflows using industry‑standard tools.
- Conduct root‑cause analysis, post‑mortem reviews, and continuous improvement initiatives to reduce downtime.
Requirements
- 5+ years of experience in mainframe operations, with deep knowledge of z/OS, CICS, and DB2.
- Proficiency in scripting and automation (Python, Bash, or similar) to support SRE practices.
- Strong understanding of performance tuning, capacity planning, and high‑availability architectures.
- Experience with monitoring and alerting platforms (e.g., Splunk, Dynatrace, or similar).
- Excellent problem‑solving skills and a collaborative mindset.
Skills
kubernetesprometheusgrafana