remote
Technical Duty Officer Site Reliability Engineer - eBay
Site Reliability Engineer
Site Reliability Engineer responsible for operating, scaling, and automating eBay's global e‑commerce platform, leveraging Linux, Kubernetes, Python, AWS, Terraform, and monitoring tools to ensure high availability and performance.
About the role
Key Responsibilities
- Design, implement, and maintain highly available services on Kubernetes and AWS infrastructure.
- Develop automation scripts and tooling in Python to streamline deployment, scaling, and incident response.
- Monitor system health using Prometheus, Grafana, and alerting pipelines; troubleshoot and resolve production incidents.
- Collaborate with development teams to embed reliability best practices into the software lifecycle.
- Manage infrastructure as code with Terraform, ensuring reproducible environments and compliance.
Requirements
- 3+ years of experience operating Linux‑based production systems at scale.
- Strong proficiency with Kubernetes orchestration and AWS cloud services.
- Hands‑on scripting/programming skills in Python (or Go) for automation and tooling.
- Experience with infrastructure‑as‑code tools such as Terraform or CloudFormation.
- Familiarity with monitoring and observability stacks (Prometheus, Grafana, Alertmanager) and incident management processes.
Skills
linuxkubernetespythonawsterraformprometheus