onsite
Reliability Engineer / SRE for CIAM - Hays Professional Solutions GmbH Standort Ulm
Site Reliability Engineer
Reliability Engineer focused on CIAM platforms, ensuring high availability, performance, and security using Python, Node.js, and AWS. Drive automation, monitoring, and incident response to maintain seamless user identity services.
About the role
Key Responsibilities
- Design, implement, and maintain CIAM infrastructure on AWS, ensuring 99.99% uptime and compliance with security standards.
- Develop automation scripts in Python and Node.js for deployment, scaling, and configuration management.
- Implement and refine monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to detect and resolve incidents proactively.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to improve system resilience.
- Collaborate with product, security, and DevOps teams to integrate reliability best practices into CIAM features.
Requirements
- 3+ years of SRE or reliability engineering experience, preferably in identity and access management.
- Hands‑on experience with AWS services (EC2, RDS, Lambda, IAM, CloudWatch).
- Solid understanding of monitoring, alerting, and incident management frameworks.
- Excellent communication skills and a proactive, problem‑solving mindset.