onsite
Senior Engineer, AI Site Reliability - Fox Corporation
Software Engineer
Senior Engineer, AI Site Reliability responsible for building and operating robust infrastructure and platforms that support live direct‑to‑consumer APIs for major events, leveraging Python, Kubernetes, Docker, and AWS to ensure high availability, scalability, and performance.
About the role
Key Responsibilities
- Design, implement, and maintain scalable, highly available infrastructure for live event APIs using Kubernetes, Docker, and AWS services.
- Develop and enforce CI/CD pipelines, monitoring, and alerting to ensure 99.9% uptime and rapid incident response.
- Collaborate with data science and product teams to integrate AI models into production workflows, optimizing inference latency and throughput.
- Lead capacity planning, cost optimization, and performance tuning for large‑scale, real‑time streaming workloads.
- Document architecture, runbooks, and best practices; mentor junior engineers on SRE principles.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles, with a strong focus on cloud-native technologies.
- Proficiency in Python, Kubernetes, Docker, and AWS (EKS, ECS, Lambda, CloudWatch).
- Hands‑on experience with API gateway, load balancing, and real‑time data streaming.
- Solid understanding of AI/ML model deployment and monitoring in production.
- Excellent problem‑solving skills, strong communication, and a proactive, collaborative mindset.
Skills
pythonkubernetesdockeraws