Overview
Miraxis is building the rights-cleared data factory for robotics and physical AI. A key differentiator is turning messy, heterogeneous real-world robotics data into training-ready datasets with verifiable quality.
As Robotics Data Pipeline Engineer, you will own the multimodal pipeline layer: ingestion, transformation, validation, QA gates, and delivery packaging. You should be able to talk shop with vendors/partners/clients on best practices (formats, sync, calibration metadata, labeling, eval outputs) and also build the tooling to manipulate and audit datasets directly.
What you'll do
- Build and operate multimodal pipelines for robotics/physical AI datasets: ingestion, transformation, validation, and delivery packaging.
- Define training-ready as enforceable checks: alignment validation, integrity checks, schema enforcement, and reproducibility standards.
- Build tooling to inspect, transform, and audit datasets (large files, long-running jobs, real-world edge cases).
- Collaborate with Ops/Delivery and Hardware & Integration to ensure capture metadata and formats support downstream usability.
- Work with partners/vendors/clients to align on formats and best practices; turn external constraints into concrete pipeline requirements.
- Maintain clear documentation (schemas, runbooks, data contracts) so a remote team can operate consistently.
What we're looking for
- Hands-on experience with robotics/physical AI datasets (multimodal: video + sensors + proprioception) and their failure modes.
- Strong Python and data engineering instincts: validation, reproducibility, and careful handling of messy real-world data.
- Comfort working at the intersection of software and domain: can reason about timing/sync, calibration metadata, and the practicalities of capture pipelines.
- Able to communicate clearly with both engineers and external stakeholders; converts ambiguity into executable specs.
Nice to have
- Experience with ROS/ROS2 data formats (bags) or other robotics logging systems.
- Familiarity with simulation/teleoperation datasets, annotation/labeling workflows, and evaluation harnesses.
- Experience building QA frameworks that surface issues early (before downstream training).
Working style & expectations
- Remote-friendly, high-ownership role. Writing and maintaining clear docs is part of the job.
- Travel may be required occasionally for partner debugging and alignment
Originally posted on Himalayas