onsite
Platform Engineer, Model Shaping - Together AI
Devops Engineer
Platform Engineer focused on building scalable services for tailoring and improving foundation models, leveraging Python, PyTorch, and modern MLOps tooling to enable efficient training, evaluation, and deployment of domain‑specific AI solutions.
About the role
Key Responsibilities
- Design and implement high‑performance backend services that enable developers to select, fine‑tune, and evaluate large language models for specific downstream tasks.
- Develop and maintain pipelines for efficient model training, data ingestion, and continuous evaluation using PyTorch and related ML frameworks.
- Build and operate cloud‑native infrastructure (Docker, Kubernetes, AWS) to support scalable, reproducible experiments and production deployments.
- Collaborate with research scientists to prototype novel training and evaluation methods, translating research prototypes into production‑ready code.
- Implement monitoring, logging, and automated testing to ensure reliability and observability of model‑shaping services.
Requirements
- Strong proficiency in Python and experience with deep learning libraries such as PyTorch or TensorFlow.
- Hands‑on experience building and operating MLOps pipelines, containerization (Docker) and orchestration (Kubernetes) in cloud environments.
- Solid understanding of machine learning concepts, especially model fine‑tuning, evaluation metrics, and data preprocessing for large language models.
- Ability to write clean, maintainable code and contribute to shared codebases using version control (Git).
- Excellent problem‑solving skills and a collaborative mindset for working with cross‑functional research and engineering teams.
Skills
pythonpytorchmachine learningmlopsdockerkubernetes