onsite

Staff Software Engineer, AI Foundation Models - Merlin Labs

Software Engineer

Senior individual contributor leading the design and productionization of large‑scale foundation model pipelines, including model selection, data engineering, distributed training, and cloud‑native inference infrastructure.

About the role

Key Responsibilities

Design, implement, and maintain end‑to‑end pipelines for training, fine‑tuning, and serving large foundation models in a production environment.
Evaluate and select optimal pre‑trained models, adapting them to aerospace‑specific tasks such as flight control, anomaly detection, and mission planning.
Build scalable data ingestion and preprocessing systems that handle noisy, high‑volume sensor streams and ensure data quality for model training.
Develop and operate distributed training infrastructure on AWS, leveraging Kubernetes, GPU clusters, and automated scaling to reduce time‑to‑train.
Implement robust ML‑Ops practices, including CI/CD for models, monitoring, versioning, and automated rollback mechanisms.
Collaborate with cross‑functional teams—software, hardware, and domain experts—to integrate AI capabilities into flight‑control software stacks.

Requirements

10+ years of software engineering experience with a focus on large‑scale machine learning systems.
Deep expertise in Python and major ML frameworks such as PyTorch or TensorFlow.
Proven experience building cloud‑native, containerized training and inference pipelines on AWS and Kubernetes.
Strong background in data engineering, including handling unstructured, high‑velocity data streams.
Demonstrated ability to design, deploy, and maintain production ML‑Ops workflows, including monitoring, logging, and automated model lifecycle management.

Skills

pythonpytorchtensorflowawskubernetes

CompanyMerlin Labs

DepartmentEngineering

LocationBoston, Massachusetts, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 25, 2026

About the role

Key Responsibilities

Design, implement, and maintain end‑to‑end pipelines for training, fine‑tuning, and serving large foundation models in a production environment.
Evaluate and select optimal pre‑trained models, adapting them to aerospace‑specific tasks such as flight control, anomaly detection, and mission planning.
Build scalable data ingestion and preprocessing systems that handle noisy, high‑volume sensor streams and ensure data quality for model training.
Develop and operate distributed training infrastructure on AWS, leveraging Kubernetes, GPU clusters, and automated scaling to reduce time‑to‑train.
Implement robust ML‑Ops practices, including CI/CD for models, monitoring, versioning, and automated rollback mechanisms.
Collaborate with cross‑functional teams—software, hardware, and domain experts—to integrate AI capabilities into flight‑control software stacks.

Requirements

10+ years of software engineering experience with a focus on large‑scale machine learning systems.
Deep expertise in Python and major ML frameworks such as PyTorch or TensorFlow.
Proven experience building cloud‑native, containerized training and inference pipelines on AWS and Kubernetes.
Strong background in data engineering, including handling unstructured, high‑velocity data streams.
Demonstrated ability to design, deploy, and maintain production ML‑Ops workflows, including monitoring, logging, and automated model lifecycle management.