onsite
Application Support Technology Lead Analyst - Vice President - Citi
Software Engineer
Lead SRE Observability initiatives, embedding telemetry, defining SLOs, and crafting dashboards across Services Technology to empower developers and platform teams with actionable insights.
About the role
Key Responsibilities
- Design and implement end‑to‑end observability solutions for Services Technology, integrating telemetry collection, storage, and analysis.
- Collaborate with SREs, developers, and platform teams to define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Build and maintain dynamic visualizations and dashboards that provide real‑time visibility into production health and performance.
- Drive continuous improvement of monitoring tooling, alerting strategies, and incident response workflows.
- Mentor and guide cross‑functional teams on best practices for observability and reliability engineering.
Requirements
- 5+ years of experience in Site Reliability Engineering or related roles with a strong focus on observability.
- Proficiency with telemetry frameworks (e.g., OpenTelemetry), monitoring platforms (e.g., Prometheus, Grafana), and log aggregation tools.
- Hands‑on experience defining SLOs/SLIs and translating them into actionable metrics.
- Excellent communication skills and ability to influence stakeholders across multiple teams.
- Experience with cloud platforms (AWS, Azure, or GCP) and container orchestration (Kubernetes) is a plus.
Skills
awsgcpkubernetesprometheusgrafanasplunk