remote

MLOps Engineer LLM/GenAI

MLOps Engineer

Lead end‑to‑end MLOps for large language models, building scalable pipelines on AWS and Azure, optimizing CUDA workloads, and implementing batch inference solutions.

About the role

Key Responsibilities

Design, develop, and maintain MLOps pipelines for large language models and generative AI applications.
Implement scalable batch inference solutions on AWS and Azure, ensuring high throughput and low latency.
Optimize GPU workloads using CUDA, profiling and tuning for performance and cost efficiency.
Collaborate with data scientists and software engineers to integrate model training, validation, and deployment workflows.
Automate monitoring, logging, and alerting for model performance and infrastructure health.

Requirements

Proven experience with MLOps tools (e.g., MLflow, Kubeflow, Airflow) and cloud platforms (AWS, Azure).
Strong background in CUDA programming and GPU optimization.
Hands‑on experience with large language models and generative AI frameworks.
Solid scripting skills in Python and familiarity with CI/CD pipelines.
Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.

Skills

mlopsllmawsazurecuda

DepartmentResearch

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 23, 2026