remote
Site Reliability Engineer US West - MinIO
Site Reliability Engineer
Site Reliability Engineer focused on building and maintaining highly available cloud-native infrastructure using Kubernetes, Docker, and AWS. Leverages CI/CD pipelines, Python scripting, and advanced monitoring to ensure system reliability and performance.
About the role
Key Responsibilities
- Design, deploy, and manage scalable Kubernetes clusters across AWS environments.
- Implement and maintain CI/CD pipelines for automated build, test, and deployment of microservices.
- Develop and maintain monitoring, alerting, and logging solutions using Prometheus, Grafana, and ELK stack.
- Automate infrastructure provisioning and configuration with Terraform, Ansible, and Python scripts.
- Collaborate with development teams to troubleshoot performance bottlenecks and implement reliability best practices.
Requirements
- 3+ years of experience in site reliability or DevOps roles.
- Experience with CI/CD tools such as GitLab CI, Jenkins, or ArgoCD.
- Solid understanding of Linux system administration and networking concepts.
Skills
kubernetesdockerawslinuxcicdpython