onsite
Machine Learning Infrastructure Engineer
Machine Learning Infrastructure Engineer
As a Machine Learning Infrastructure Engineer, you will design, operate, and optimize GPU infrastructure for model hosting and serving. This role involves building scalable model serving systems, implementing multi-model routing, and owning the end-to-end model lifecycle, while driving inference optimizations and building self-service platforms.
About the role
About the Role
We are seeking a Machine Learning Infrastructure Engineer to design and operate cutting-edge GPU infrastructure and model serving systems. This role involves end-to-end ownership of the model lifecycle, from deployment to monitoring and scaling, while also driving critical inference optimizations.
Responsibilities
- Design and operate GPU infrastructure for model hosting, including provisioning, scheduling, and cost optimization across cloud and on-premise environments.
- Build and scale model serving systems using vLLM, TensorRT-LLM, Triton, or equivalent, supporting real-time inference with strong latency and availability guarantees.
- Implement multi-model routing to serve multiple models across modalities (text, voice, code, vision) on shared infrastructure.
- Own the model lifecycle end to end: download, deploy, serve, monitor, swap, and scale.
- Drive inference optimization including quantization strategies (AWQ, GPTQ), batching, caching, and cold start reduction.
- Build self-service infrastructure platforms where teams provision compute, storage, and model endpoints through APIs and control planes.
- Implement infrastructure-as-code at scale using Terraform, Pulumi, or CDK.
- Build observability and reliability for inference systems: SLIs/SLOs, GPU utilization monitoring, latency tracking, automated capacity planning, and alerting.
- Define platform standards and governance including multi-tenant isolation, cost attribution, and resource quotas.
- Lead architectural design and influence engineering direction across the AI infrastructure stack.
Skills
GPUVllmTensorRT LLMTritonquantizationAWQGPTQapiTerraformPulumiCDK