onsite
Senior Site Reliability Engineer - Realtor.com
Site Reliability Engineer
Lead the reliability and scalability of a high‑traffic real‑estate platform, driving automation, performance, and incident response using Kubernetes, AWS, Docker, Terraform, and advanced monitoring tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Terraform and Kubernetes.
- Automate deployment pipelines and CI/CD workflows to accelerate feature delivery while ensuring reliability.
- Implement robust monitoring, alerting, and incident response processes using Prometheus, Grafana, and PagerDuty.
- Collaborate with development teams to optimize application performance, cost, and security.
- Lead post‑mortem analyses, root cause investigations, and continuous improvement initiatives.
Requirements
- 5+ years of experience in site reliability or DevOps roles on large‑scale web platforms.
- Deep expertise with Kubernetes, Docker, and AWS services (EC2, RDS, S3, EKS).
- Proficient in infrastructure as code (Terraform, CloudFormation) and CI/CD tooling (GitHub Actions, Jenkins).
- Strong scripting skills in Python or Bash and familiarity with monitoring/observability stacks.
- Excellent problem‑solving, communication, and collaboration abilities.
Skills
kubernetesawsdockerterraformcicd