onsite

SRE Site Reliability Engineer / Platform Engineer - Codevian Technologies Pvt Ltd

Site Reliability Engineer

Lead the design, deployment, and operation of scalable AWS infrastructure, driving Kubernetes (EKS) adoption and GitOps automation to deliver resilient, high‑availability services.

About the role

Key Responsibilities

Design, provision, and maintain large‑scale AWS environments (EKS, EC2, RDS Aurora, ElastiCache, Control Tower) to support production workloads.
Lead Kubernetes (multi‑cluster, multi‑environment) migration and day‑to‑day operations, ensuring high availability and performance.
Implement Infrastructure as Code with Terraform, integrating GitOps tools such as ArgoCD, Atlantis, and custom pipelines for automated, repeatable deployments.
Define and enforce SLI/SLO frameworks, enhancing observability, monitoring, and incident response across the platform.
Collaborate with development teams to embed reliability best practices, including auto‑scaling with Karpenter and event‑driven scaling via KEDA.

Requirements

5+ years of experience in cloud operations, with deep expertise in AWS and Kubernetes.
Proficient in Terraform, GitOps workflows, and CI/CD tooling (ArgoCD, Atlantis).
Strong scripting skills (Python, Bash) and familiarity with monitoring/alerting stacks (Prometheus, Grafana, Alertmanager).
Hands‑on experience with Karpenter, KEDA, and container orchestration best practices.
Excellent problem‑solving, communication, and collaboration abilities in a fast‑paced environment.

Skills

awskubernetesterraform

CompanyCodevian Technologies Pvt Ltd

DepartmentEngineering

LocationWest Bengal, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 20, 2026