onsite
Staff Engineer - SRE - Redpin
Site Reliability Engineer
Lead the engineering of highly available, scalable infrastructure for real‑estate payment services, driving automation, reliability, and performance across AWS, Kubernetes, and Terraform environments.
About the role
Key Responsibilities
- Architect, build, and maintain production‑grade infrastructure for high‑traffic payment services using AWS, Kubernetes, and Terraform.
- Design and implement CI/CD pipelines, automated testing, and deployment workflows to accelerate feature delivery while ensuring reliability.
- Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve system resilience.
- Collaborate with product and security teams to enforce best practices, compliance, and secure coding standards.
- Mentor and coach junior engineers, fostering a culture of ownership, continuous learning, and high performance.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles, with a strong focus on cloud infrastructure.
- Proficiency with AWS services (EC2, RDS, S3, CloudWatch, IAM) and Kubernetes cluster management.
- Hands‑on experience with Terraform, Helm, and CI/CD tools such as GitHub Actions or Jenkins.
- Deep understanding of monitoring, logging, and alerting (Prometheus, Grafana, ELK stack).
- Excellent problem‑solving skills, strong communication, and a passion for building reliable, scalable systems.
Skills
kubernetesawsterraformcicd