remote
Platform Reliability Engineer - eBay
Software Engineer
Join a dynamic team to build and maintain highly available, scalable infrastructure on AWS and Kubernetes, ensuring seamless uptime for millions of users while driving automation and incident response excellence.
About the role
Key Responsibilities
- Design, implement, and operate resilient cloud infrastructure on AWS, leveraging Kubernetes for container orchestration.
- Develop and maintain CI/CD pipelines to accelerate feature delivery and reduce deployment risk.
- Implement comprehensive monitoring, alerting, and logging solutions to detect and remediate incidents proactively.
- Lead incident investigations, root cause analysis, and post‑mortem documentation to drive continuous improvement.
- Collaborate with development, security, and product teams to embed reliability best practices across the software lifecycle.
Requirements
- Strong experience with AWS services (EC2, EKS, RDS, CloudWatch, etc.) and Kubernetes cluster management.
- Proficiency in scripting (Python, Bash) and automation tools (Terraform, Ansible).
- Hands‑on experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD) and container registries.
- Solid understanding of monitoring/alerting platforms (Prometheus, Grafana, Datadog) and log aggregation.
- Excellent problem‑solving skills, ability to work under pressure, and a collaborative mindset.
Skills
awskubernetescicdpython