remote
Staff MLOps Engineer - NBCUniversal
MLOps Engineer
Lead the design, implementation, and scaling of end‑to‑end MLOps pipelines, leveraging Python, Kubernetes, Docker, and AWS to deliver reliable, production‑grade machine‑learning services.
About the role
Key Responsibilities
- Architect and build robust, scalable MLOps platforms on Kubernetes and AWS for high‑throughput model training and inference.
- Develop CI/CD pipelines using Terraform, Docker, and industry‑standard tools to automate model versioning, testing, and deployment.
- Collaborate with data scientists and software engineers to integrate ML frameworks (e.g., TensorFlow, PyTorch) and tracking tools such as MLflow.
- Implement monitoring, logging, and alerting solutions to ensure model performance, reliability, and compliance in production.
- Drive best practices for reproducibility, security, and cost optimization across the ML lifecycle.
Requirements
- 5+ years of hands‑on experience building MLOps infrastructure, preferably in a media or streaming environment.
- Strong proficiency in Python and container orchestration with Kubernetes and Docker.
- Deep knowledge of cloud services (AWS EC2, S3, SageMaker, EKS) and IaC tools such as Terraform.
- Experience designing CI/CD pipelines for model training, validation, and deployment.
- Familiarity with ML lifecycle tools (MLflow, Kubeflow, Airflow) and monitoring frameworks.
Skills
pythonkubernetesdockerawsterraformcicdmlflow