onsite
Genomic Data Scientist - Cancer
Data Scientist
Lead the development of AI‑driven pipelines for cancer genomics, applying machine learning and NLP to large‑scale sequencing data while containerizing workflows for reproducible research.
About the role
Key Responsibilities
- Design and implement machine‑learning models to identify cancer‑specific genomic signatures.
- Develop NLP pipelines to extract and curate clinical and scientific text for integration with genomic datasets.
- Containerize bioinformatics workflows using Docker (or similar) to ensure reproducibility and scalability across cloud and HPC environments.
- Collaborate with biologists, clinicians, and software engineers to translate research questions into data‑driven solutions.
- Validate models and pipelines on heterogeneous sequencing data, ensuring statistical rigor and compliance with data‑privacy standards.
Requirements
- Advanced degree (MSc/PhD) in Bioinformatics, Computational Biology, Computer Science, or related field.
- Proficiency in Python and R for data analysis, model building, and visualization.
- Hands‑on experience with machine‑learning frameworks (e.g., scikit‑learn, TensorFlow, PyTorch) and NLP libraries (e.g., spaCy, NLTK).
- Strong background in genomics, particularly cancer genomics, and familiarity with sequencing data formats (FASTQ, BAM, VCF).
- Experience containerizing bioinformatics pipelines with Docker and orchestrating them on cloud or HPC platforms.
Skills
pythonmachine learningdockernatural language processing