onsite
SRE/DevOps Engineer - Versana
Site Reliability Engineer
Lead the design, deployment, and maintenance of scalable, highly available infrastructure for a real‑time loan data platform using Kubernetes, AWS, and Terraform, while ensuring reliability, performance, and continuous delivery.
About the role
Key Responsibilities
- Architect, implement, and manage containerized services on Kubernetes clusters to support real‑time data ingestion and processing.
- Design and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools to automate build, test, and deployment workflows.
- Provision and manage AWS resources (EKS, EC2, RDS, S3, CloudWatch) with IaC tools such as Terraform or CloudFormation.
- Implement robust monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK stack) to ensure system health and rapid incident response.
- Collaborate with development teams to enforce best practices for code quality, security, and performance.
- Participate in on‑call rotations, troubleshoot production incidents, and conduct post‑mortem analyses to drive continuous improvement.
Requirements
- 5+ years of experience in SRE or DevOps roles within high‑scale, data‑centric environments.
- Deep expertise in Kubernetes, Docker, and container orchestration best practices.
- Proficient with AWS services and infrastructure-as-code tools (Terraform, CloudFormation).
- Strong scripting skills (Python, Bash) and familiarity with CI/CD tooling.
- Excellent problem‑solving abilities, strong communication skills, and a proactive, collaborative mindset.
Skills
kubernetesawsterraformcicd