remote
Lead Site Reliability Engineer - Database Administration - SimCorp
Site Reliability Engineer
Lead a high‑performing SRE team focused on database reliability, automation, and performance tuning using Linux, Kubernetes, Terraform, and modern monitoring tools in a FinTech environment.
About the role
Key Responsibilities
- Design, implement, and maintain highly available PostgreSQL and MySQL clusters on Kubernetes and cloud platforms.
- Develop infrastructure‑as‑code pipelines with Terraform and automate operational tasks using Python and Bash.
- Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve service reliability.
- Implement observability solutions (Prometheus, Grafana, CloudWatch) and define SLOs/SLA metrics for database services.
- Mentor junior SREs and DBA staff, fostering a culture of collaboration, knowledge sharing, and continuous learning.
Requirements
- 5+ years of experience in site reliability engineering or database administration, with deep expertise in PostgreSQL and MySQL.
- Strong command of Linux systems, container orchestration (Kubernetes), and infrastructure‑as‑code tools (Terraform, Ansible).
- Proficiency in scripting/automation languages such as Python or Bash.
- Hands‑on experience with monitoring, alerting, and performance tuning using Prometheus, Grafana, or similar tools.
- Excellent problem‑solving, communication, and leadership skills in a fast‑paced FinTech environment.
Skills
linuxkubernetesterraformpostgresqlmysqlprometheuspython