onsite
Staff SRE & DeFi Scalability Lead - ABC Labs
Site Reliability Engineer
Lead the design and operation of highly available, scalable infrastructure for a cutting‑edge DeFi platform, driving reliability, performance, and security across cloud, container, and blockchain services.
About the role
Key Responsibilities
- Architect and maintain a resilient, scalable infrastructure for a high‑throughput DeFi platform, ensuring 99.99% uptime and rapid feature rollouts.
- Lead incident response, post‑mortem analysis, and continuous improvement of SRE practices, including monitoring, alerting, and capacity planning.
- Collaborate with blockchain and smart‑contract teams to integrate on‑chain events, data pipelines, and security audits into the observability stack.
- Drive automation of deployment pipelines using Terraform, GitOps, and CI/CD tools, reducing manual effort and deployment risk.
- Mentor and grow a high‑performing SRE team, fostering a culture of reliability, ownership, and cross‑functional collaboration.
Requirements
- 10+ years of experience in site reliability engineering, with a strong focus on cloud-native and containerized environments.
- Deep expertise in Kubernetes, Terraform, AWS, and Go, plus hands‑on experience with monitoring (Prometheus, Grafana) and alerting (PagerDuty, Opsgenie).
- Proven track record building and scaling DeFi or blockchain infrastructure, including smart‑contract integration and on‑chain data ingestion.
- Strong analytical skills, ability to troubleshoot complex distributed systems, and a passion for automation and process improvement.
- Excellent communication skills and a collaborative mindset, comfortable working with engineering, product, and security teams.
Skills
kubernetesterraformawsgosmart contracts