About the role
We are looking for a highly organized and technically proficient Data Operations Lead to own and scale the operational lifecycle of biomedical data partnerships. In this critical role, you will serve as the bridge between external clinical and research partners, our internal Data team, and the engineering environment that powers our AI foundation models.
What you'll be doing
- Data Partnership Operations & Lifecycle Management: Own the operational lifecycle of external data partnerships following contract signature. Act as the primary operational and technical point of contact for hospitals, biobanks, CROs, and research laboratories. Coordinate onboarding, data delivery timelines, and stakeholder communication to ensure successful execution of partnership milestones.
- Data Transfer & Infrastructure Coordination: Manage secure biomedical data transfers using cloud infrastructure and standardized transfer protocols. Coordinate access management, encryption, and ingestion workflows across cloud storage systems (AWS S3, SFTP, APIs, direct upload pipelines). Ensure incoming datasets are delivered, validated, and tracked according to internal governance standards.
- Clinical & Multi-Omics Data Harmonization: Collaborate with internal technical and product teams to define and maintain harmonized data models and metadata standards across complex clinical and multi-modal datasets. Organize and maintain relationships between clinical metadata and associated omics or imaging assets, including genomics, transcriptomics, spatial biology, and pathology data.
- Pipeline Operations & Automation: Work closely with engineering and data teams to configure and maintain lightweight ingestion and QC pipelines. Identify operational bottlenecks and repetitive workflows and convert them into scalable systems, scripts, templates, dashboards, or automation tools that improve operational efficiency and visibility.
- Data Quality Oversight: Coordinate automated and manual quality control checks across incoming datasets. Identify missing data, inconsistencies, corruption, or metadata mismatches and work directly with external partners to resolve issues. Ensure data integrity, traceability, and version control throughout the ingestion process.
- Operational Tracking & Reporting: Maintain a centralized “single source of truth” for all incoming datasets, including ingestion status, completeness, QC status, and milestone tracking. Build and maintain reporting dashboards and operational tools to provide visibility into project progress, ingestion velocity, and operational risks.
- Cross-Functional Collaboration & Communication: Partner closely with Data Science, Engineering, Legal, and Partnership teams to align operational execution with business and scientific priorities. Communicate technical issues clearly to both scientific collaborators and non-technical stakeholders. Provide regular updates on operational risks, blockers, and delivery progress.
- Site Visits & External Partner Engagement: Conduct periodic visits to partner hospitals, biobanks, and laboratories to support onboarding, troubleshoot technical or operational bottlenecks, and strengthen long-term collaborations.
What you'll bring
The successful candidate will have a ‘team-first’ attitude; be highly organized, proactive, and detail-oriented; thrive in a fast-paced and evolving environment; and enjoy solving operational and technical challenges at scale. We value individuals who combine strong project management capabilities with hands-on technical fluency and an understanding of biomedical data ecosystems.
- Biomedical Data Expertise: Strong understanding of clinical and biomedical data structures, including real-world data, clinical trial datasets, and multi-omics data modalities. Familiarity with oncology, immunology, or related therapeutic areas is highly desirable.
- Cloud & Data Infrastructure: Proven experience managing data lifecycles in cloud environments, particularly AWS (S3, CLI, access management). Familiarity with secure data transfer protocols and large-scale biomedical data handling workflows.
- Data Wrangling & Technical Skills: Proficiency in Python, along with SQL for querying and transforming datasets. Ability to write lightweight scripts, automate workflows, and interact with APIs or cloud-based systems.
- Project & Stakeholder Management: Demonstrated ability to manage multiple external collaborations and operational workstreams simultaneously. Excellent communication skills, with the ability to translate technical issues into clear guidance for both scientific and non-technical stakeholders.
- Operational Problem Solving: Comfortable working independently in ambiguous environments. Strong analytical and organizational skills with the ability to identify bottlenecks, improve processes, and drive operational efficiency.
- Educational Background: Bachelor’s or Master’s degree in Life Sciences, Bioinformatics, Health Informatics, Computer Science, or a related quantitative field.
How to stand out
- Experience working directly with hospitals, biobanks, laboratories, or clinical research organizations.
- Familiarity with biomedical data standards, anonymization, and compliance frameworks (GDPR, HIPAA).
- Experience managing large-scale biomedical datasets in cloud environments, particularly AWS.
- Knowledge of digital pathology and/or multi-omics data workflows.
- Experience handling genomics and transcriptomics file formats (e.g. FASTQ, BAM, VCF, TIFF).
- Experience building operational tracking tools, dashboards, or reporting systems.
- Experience automating operational workflows using scripts, APIs, or lightweight pipelines.
- Proven ability to manage cross-functional and external stakeholder relationships in complex data projects.