remote
Site Reliability Engineer - Cboe Global Markets
Site Reliability Engineer
Site Reliability Engineer responsible for designing, automating, and operating highly available trading infrastructure using Linux, cloud (AWS), container orchestration (Kubernetes), and infrastructure‑as‑code tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, low‑latency services that support core trading and market data platforms.
- Develop automation scripts and CI/CD pipelines (Python, Terraform) to provision, configure, and scale infrastructure on AWS.
- Manage Kubernetes clusters, ensuring reliable deployment, observability, and performance of micro‑services.
- Implement monitoring, alerting, and incident response processes using Prometheus, Grafana, and related tooling.
- Collaborate with development, security, and operations teams to drive reliability best practices and continuous improvement.
Requirements
- 3+ years of experience in site reliability, DevOps, or systems engineering roles.
- Strong proficiency with Linux systems and scripting in Python.
- Hands‑on experience with Kubernetes orchestration and cloud platforms, preferably AWS.
- Expertise in infrastructure‑as‑code tools such as Terraform and CI/CD pipelines (Jenkins, GitLab CI, or similar).
- Solid understanding of monitoring, logging, and alerting frameworks (Prometheus, Grafana, ELK).
Skills
linuxpythonkubernetesterraformawscicdprometheus