remote
MTS 1 Site Reliability Engineer - eBay
Site Reliability Engineer
Mid‑level Site Reliability Engineer responsible for designing, automating, and operating highly available services on cloud infrastructure, using Kubernetes, Terraform, and modern programming languages.
About the role
Key Responsibilities
- Design, implement, and maintain scalable, fault‑tolerant services on AWS using infrastructure‑as‑code (Terraform).
- Develop automation and tooling in Python and Go to improve deployment pipelines and incident response.
- Operate and troubleshoot Kubernetes clusters, ensuring high availability and performance.
- Implement monitoring, alerting, and observability solutions with Prometheus, Grafana, and related tools.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
Requirements
- 3+ years of experience in Linux system administration and cloud environments (AWS preferred).
- Strong proficiency in Kubernetes orchestration and containerization concepts.
- Hands‑on experience with Terraform or similar IaC tools.
- Programming skills in Python and Go for automation and tooling.
- Familiarity with monitoring stacks such as Prometheus/Grafana and incident management processes.
Skills
linuxkubernetesterraformpythongoawsprometheus