onsite
Senior Site Reliability Engineer - Intone Networks
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, performance, and automation for client‑facing software systems using Python, Go, Kubernetes, and cloud observability tools.
About the role
Key Responsibilities
- Design, implement, and maintain scalable monitoring, alerting, and incident response pipelines using Prometheus, Grafana, and AWS CloudWatch.
- Automate operational tasks and deployments with Python, Go, and CI/CD pipelines to reduce toil and accelerate release cycles.
- Collaborate with engineering, architecture, and product teams to define reliability SLAs, SLOs, and production readiness criteria.
- Lead post‑incident reviews, root‑cause analysis, and continuous improvement initiatives to enhance system resilience.
- Manage Kubernetes clusters, ensuring high availability, efficient resource utilization, and secure configuration.
Requirements
- 5+ years of experience in site reliability or DevOps roles, with a strong background in cloud-native infrastructure.
- Proficiency in Python and Go for scripting and automation.
- Hands‑on experience with Kubernetes, Prometheus, Grafana, and AWS services.
- Deep understanding of CI/CD pipelines, GitOps, and infrastructure as code.
- Excellent problem‑solving skills, strong communication, and a proactive, collaborative mindset.
Skills
pythongokubernetesprometheusgrafanaawscicd