onsite
DevOps / SRE Engineer for Cloud Environments - AI Focus - Scopevisio AG
Site Reliability Engineer
Lead cloud operations and site reliability engineering, driving automation, scalability, and AI‑enabled observability across AWS and Kubernetes environments.
About the role
Key Responsibilities
- Design, implement, and maintain highly available cloud infrastructure on AWS, ensuring performance, security, and cost efficiency.
- Develop and manage CI/CD pipelines, automating deployments, rollbacks, and blue‑green strategies for microservices.
- Implement robust monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK) to detect and resolve incidents proactively.
- Collaborate with data science teams to integrate Machine Learning Ops workflows, ensuring model deployment and monitoring in production.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to continuously improve reliability.
Requirements
- 3+ years of experience in DevOps or SRE roles within cloud‑native environments.
- Hands‑on expertise with AWS services (EKS, ECS, Lambda, CloudFormation) and Kubernetes orchestration.
- Proficiency in scripting (Python, Bash) and configuration management (Terraform, Ansible).
- Strong understanding of CI/CD tools (GitLab CI, Jenkins, ArgoCD) and observability stacks.
- Experience with ML Ops concepts and integrating AI models into production pipelines is a plus.