onsite
IT Site Support Engineer - Citi
Software Engineer
Hands‑on SRE Observability Specialist driving telemetry, SLO implementation, and production visualizations across Services Technology, collaborating with developers and platform teams to enhance reliability and insight.
About the role
Key Responsibilities
- Design, implement, and maintain observability solutions for Services Technology environments.
- Embed telemetry and define SLOs in collaboration with SREs, developers, and platform teams.
- Build and refine dashboards and visualizations that provide actionable insights into production systems.
- Participate in incident response, root‑cause analysis, and post‑mortem documentation.
- Advise and train teams on best practices for observability and reliability engineering.
Requirements
- Proven experience in SRE or Site Reliability Engineering roles.
- Strong knowledge of observability tools (e.g., Prometheus, Grafana, OpenTelemetry).
- Hands‑on experience with SLO definition, monitoring, and alerting.
- Excellent communication skills and ability to collaborate across cross‑functional teams.
- Experience with cloud platforms (AWS, Azure, or GCP) is a plus.
Skills
prometheusgrafanasplunkjira