onsite
Site Reliability Engineer II - Motion
Site Reliability Engineer
Mid‑level Site Reliability Engineer focused on automating operations, improving reliability, and supporting large‑scale, cloud‑native systems using Python, Linux, Kubernetes, AWS, and infrastructure‑as‑code tools.
About the role
Key Responsibilities
- Design, implement, and maintain automation scripts and tools to reduce manual operational toil.
- Develop and manage highly available, fault‑tolerant services on AWS, leveraging Kubernetes and container orchestration.
- Build and evolve monitoring, alerting, and observability pipelines (e.g., Prometheus, Grafana) to provide holistic system health visibility.
- Collaborate with development teams to embed reliability best practices into CI/CD pipelines and release processes.
- Participate in incident response, root‑cause analysis, and post‑mortem creation to drive continuous improvement.
Requirements
- 2+ years of experience in site reliability or systems engineering roles.
- Proficiency in Python scripting and Linux system administration.
- Hands‑on experience with Kubernetes, Docker, and cloud platforms such as AWS.
- Familiarity with infrastructure‑as‑code tools (e.g., Terraform) and CI/CD frameworks.
- Strong problem‑solving skills and ability to work in fast‑paced, collaborative environments.
Skills
pythonlinuxkubernetesawsterraformcicd