onsite

Genomic Data Scientist - Cancer

Data Scientist

Lead the development of AI‑driven pipelines for cancer genomics, applying machine learning and NLP to large‑scale sequencing data while containerizing workflows for reproducible research.

About the role

Key Responsibilities

Design and implement machine‑learning models to identify cancer‑specific genomic signatures.
Develop NLP pipelines to extract and curate clinical and scientific text for integration with genomic datasets.
Containerize bioinformatics workflows using Docker (or similar) to ensure reproducibility and scalability across cloud and HPC environments.
Collaborate with biologists, clinicians, and software engineers to translate research questions into data‑driven solutions.
Validate models and pipelines on heterogeneous sequencing data, ensuring statistical rigor and compliance with data‑privacy standards.

Requirements

Advanced degree (MSc/PhD) in Bioinformatics, Computational Biology, Computer Science, or related field.
Proficiency in Python and R for data analysis, model building, and visualization.
Hands‑on experience with machine‑learning frameworks (e.g., scikit‑learn, TensorFlow, PyTorch) and NLP libraries (e.g., spaCy, NLTK).
Strong background in genomics, particularly cancer genomics, and familiarity with sequencing data formats (FASTQ, BAM, VCF).
Experience containerizing bioinformatics pipelines with Docker and orchestrating them on cloud or HPC platforms.

Skills

pythonmachine learningdockernatural language processing

DepartmentResearch

LocationLondon, United Kingdom

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026