remote
Senior Staff Engineer, Machine Learning Platform - PubMatic
ML Engineer
Senior Staff Engineer leading the design and scaling of a global ML platform, driving end‑to‑end pipelines for petabyte‑scale data, GPU‑accelerated inference, and production‑ready models across a high‑volume ad ecosystem.
About the role
Key Responsibilities
- Architect and maintain a scalable ML platform that supports data ingestion, feature engineering, model training, and inference for trillions of ad impressions.
- Integrate GPU‑accelerated frameworks (e.g., Triton Inference, CUDA) to deliver low‑latency, high‑throughput inference services.
- Collaborate with ML Engineers, Data Scientists, and Product teams to define experiment pipelines, model versioning, and deployment workflows.
- Design and implement robust monitoring, logging, and alerting for model performance and data quality across distributed environments.
- Evaluate and adopt emerging ML infrastructure technologies, ensuring alignment with industry best practices and cost‑efficiency goals.
Requirements
- 10+ years of software engineering experience with a strong focus on machine learning infrastructure.
- Proficiency in Python, PyTorch/TensorFlow, and experience deploying models with Triton Inference.
- Deep knowledge of GPU computing, distributed systems, and cloud platforms (AWS, GCP, or Azure).
- Hands‑on experience with ML Ops tools (MLflow, Kubeflow, Airflow) and CI/CD pipelines for model deployment.
- Excellent communication skills and a proven track record of leading cross‑functional technical initiatives.