remote
MLOps Engineer LLM/GenAI
MLOps Engineer
Lead end‑to‑end MLOps for large language models, building scalable pipelines on AWS and Azure, optimizing CUDA workloads, and implementing batch inference solutions.
About the role
Key Responsibilities
- Design, develop, and maintain MLOps pipelines for large language models and generative AI applications.
- Implement scalable batch inference solutions on AWS and Azure, ensuring high throughput and low latency.
- Optimize GPU workloads using CUDA, profiling and tuning for performance and cost efficiency.
- Collaborate with data scientists and software engineers to integrate model training, validation, and deployment workflows.
- Automate monitoring, logging, and alerting for model performance and infrastructure health.
Requirements
- Proven experience with MLOps tools (e.g., MLflow, Kubeflow, Airflow) and cloud platforms (AWS, Azure).
- Strong background in CUDA programming and GPU optimization.
- Hands‑on experience with large language models and generative AI frameworks.
- Solid scripting skills in Python and familiarity with CI/CD pipelines.
- Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
Skills
mlopsllmawsazurecuda