onsite
Staff Software Engineer, AI Foundation Models - Merlin Labs
Software Engineer
Senior individual contributor leading the design and productionization of large‑scale foundation model pipelines, including model selection, data engineering, distributed training, and cloud‑native inference infrastructure.
About the role
Key Responsibilities
- Design, implement, and maintain end‑to‑end pipelines for training, fine‑tuning, and serving large foundation models in a production environment.
- Evaluate and select optimal pre‑trained models, adapting them to aerospace‑specific tasks such as flight control, anomaly detection, and mission planning.
- Build scalable data ingestion and preprocessing systems that handle noisy, high‑volume sensor streams and ensure data quality for model training.
- Develop and operate distributed training infrastructure on AWS, leveraging Kubernetes, GPU clusters, and automated scaling to reduce time‑to‑train.
- Implement robust ML‑Ops practices, including CI/CD for models, monitoring, versioning, and automated rollback mechanisms.
- Collaborate with cross‑functional teams—software, hardware, and domain experts—to integrate AI capabilities into flight‑control software stacks.
Requirements
- 10+ years of software engineering experience with a focus on large‑scale machine learning systems.
- Deep expertise in Python and major ML frameworks such as PyTorch or TensorFlow.
- Proven experience building cloud‑native, containerized training and inference pipelines on AWS and Kubernetes.
- Strong background in data engineering, including handling unstructured, high‑velocity data streams.
- Demonstrated ability to design, deploy, and maintain production ML‑Ops workflows, including monitoring, logging, and automated model lifecycle management.
Skills
pythonpytorchtensorflowawskubernetes