remote

Senior Cloud Site Reliability Engineer - Solace

Site Reliability Engineer

Senior SRE leading reliability for cloud‑native, event‑driven platforms. Design, automate, and operate scalable infrastructure on AWS using Kubernetes, Terraform, Python and CI/CD pipelines while ensuring high availability and observability.

About the role

Key Responsibilities

Design, build, and maintain highly available, scalable Kubernetes clusters on AWS for real‑time event streaming services.
Automate infrastructure provisioning and configuration management using Terraform and Python scripts.
Implement and manage CI/CD pipelines to enable rapid, reliable deployments and rollbacks.
Develop comprehensive monitoring, logging, and alerting solutions to ensure service reliability and performance.
Collaborate with development and product teams to define SLOs/SLAs and drive incident response and post‑mortem processes.

Requirements

5+ years of experience in site reliability or DevOps engineering, preferably in cloud‑native environments.
Strong expertise with Kubernetes, AWS services (EKS, EC2, RDS, S3), and infrastructure‑as‑code tools such as Terraform.
Proficiency in scripting/automation using Python and familiarity with CI/CD tools (Jenkins, GitLab CI, GitHub Actions).
Hands‑on experience with observability stacks (Prometheus, Grafana, ELK, CloudWatch) and incident management.
Solid understanding of networking, security, and high‑availability architectures for event‑driven systems.

Skills

kubernetesawsterraformpythoncicd

CompanySolace

DepartmentEngineering

LocationIndia

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 24, 2026