onsite

principal Engineer-Site Reliability Engineering and AIOps - Wells Fargo

Software Engineer

Lead enterprise‑wide Site Reliability Engineering and AIOps initiatives, defining reliability strategy, reference architectures, and automation standards to embed resilience across a large application portfolio.

About the role

Key Responsibilities

Architect and implement enterprise‑scale SRE and AIOps frameworks, including SLOs, error budgets, and incident response playbooks.
Drive full‑stack observability across multiple lines of business, selecting and integrating monitoring, tracing, and log analytics tools.
Lead cross‑functional teams to embed reliability best practices into the software delivery lifecycle and operating model.
Develop and maintain reference architectures, engineering standards, and automation pipelines to accelerate reliability improvements.
Mentor and coach engineering teams on SRE principles, incident management, and continuous improvement.

Requirements

10+ years of experience in large‑scale distributed systems, with deep expertise in SRE and AIOps.
Proven track record designing and deploying observability, incident response, and automation solutions at enterprise scale.
Strong knowledge of cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
Excellent communication skills and ability to influence stakeholders across multiple business units.
Experience with IaC, CI/CD, and modern monitoring/alerting stacks (Prometheus, Grafana, ELK, etc.).

Skills

pythonjavaansiblelinuxprometheusgrafanasplunkagile

CompanyWells Fargo

DepartmentEngineering

LocationTelangana, India

Experience7+ years

Tenurefull-time

LevelLead

Posted June 19, 2026