onsite
Site Reliability Engineer I - Calix
Site Reliability Engineer
Entry‑level Site Reliability Engineer focused on building and operating scalable cloud infrastructure, automating deployments, and ensuring high availability of the Calix AI‑powered platform using Kubernetes, AWS, and modern DevOps tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available services on AWS using Kubernetes and Terraform.
- Develop automation scripts and tools in Python to streamline deployment, monitoring, and incident response.
- Collaborate with development and product teams to define reliability targets and improve system performance.
- Implement observability solutions with Prometheus, Grafana, and logging pipelines to detect and resolve issues proactively.
- Participate in on‑call rotations, perform root‑cause analysis, and drive post‑mortem improvements.
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
- Hands‑on experience with Linux systems administration and scripting (Python preferred).
- Familiarity with container orchestration (Kubernetes) and infrastructure‑as‑code tools (Terraform, CloudFormation).
- Understanding of cloud services on AWS, including EC2, S3, RDS, and networking.
- Experience with monitoring, alerting, and CI/CD pipelines (e.g., Prometheus, Grafana, Jenkins, GitHub Actions).
Skills
pythonkubernetesawsterraformlinuxprometheuscicd