onsite
Senior Software Engineer - Site Reliability Engineering, Cloud Storage
Software Engineer
Lead the design and automation of scalable, fault‑tolerant cloud storage solutions, driving capacity planning and reliability across distributed systems.
About the role
Key Responsibilities
- Architect and implement highly available, scalable cloud storage services that meet strict reliability and performance SLAs.
- Develop and maintain automation pipelines for capacity planning, provisioning, and scaling of storage resources.
- Design and enforce fault‑tolerance strategies, including data replication, disaster recovery, and self‑healing mechanisms.
- Collaborate with cross‑functional teams to integrate new features while preserving system stability and observability.
- Analyze system metrics, conduct post‑incident reviews, and drive continuous improvement initiatives.
Requirements
- 5+ years of experience in site reliability engineering or a related field, with a strong focus on cloud storage.
- Proficiency in distributed systems design, capacity planning, and automation tooling (e.g., Terraform, Ansible, Kubernetes).
- Deep understanding of fault‑tolerance concepts, data replication, and disaster recovery.
- Hands‑on experience with major cloud platforms (AWS, GCP, Azure) and their storage services.
- Excellent problem‑solving skills, strong communication, and a proactive mindset for continuous improvement.
Skills
software developmentsystem designproblem solving