remote
Site Reliability Engineer Full-Stack - UpSmart Solutions
Site Reliability Engineer
Senior Site Reliability Engineer with full‑stack expertise, ensuring high availability, performance, and scalability of a high‑traffic e‑commerce platform using AWS, Docker, Kubernetes, Python, and robust CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and maintain scalable infrastructure on AWS to support a high‑traffic e‑commerce website.
- Develop and manage CI/CD pipelines, ensuring rapid, reliable deployments across Docker and Kubernetes environments.
- Monitor system health, performance, and capacity using Prometheus, Grafana, and custom alerts; proactively resolve incidents.
- Automate operational tasks with Python scripts and Terraform, reducing manual effort and improving reliability.
- Collaborate with development teams to embed SRE best practices into the full‑stack development lifecycle.
Requirements
- 5–8 years of experience as a Site Reliability Engineer or DevOps Engineer.
- Strong proficiency in AWS services (EC2, RDS, ELB, CloudWatch) and container orchestration (Docker, Kubernetes).
- Hands‑on experience with CI/CD tools (GitLab CI, Jenkins, ArgoCD) and infrastructure as code (Terraform, CloudFormation).
- Excellent scripting skills in Python and Bash, with a focus on automation and observability.
- Solid understanding of web technologies (Node.js, React, REST APIs) and performance tuning.
Skills
awsdockerkubernetespythoncicd