onsite

AI Research Engineer Multimodal & Vision - Jobgether

Research Engineer

Research‑focused AI Engineer developing multimodal vision‑language models, handling dataset creation, training pipelines, evaluation, and optimization using Python and deep‑learning frameworks.

About the role

Key Responsibilities

Design and implement end‑to‑end multimodal models that integrate visual and textual data.
Build and maintain large‑scale data pipelines for image, video, and language datasets, including annotation and preprocessing.
Develop training scripts and optimization routines using PyTorch/TensorFlow, leveraging GPU acceleration (CUDA) for efficient model convergence.
Conduct rigorous model evaluation, benchmarking against state‑of‑the‑art metrics, and iterate to improve performance.
Collaborate with cross‑functional research teams to publish findings and translate prototypes into production‑ready components.

Requirements

Strong programming skills in Python and experience with deep‑learning libraries such as PyTorch or TensorFlow.
Hands‑on experience in computer vision and multimodal learning, including model architectures like Transformers, CNNs, and CLIP‑style systems.
Proficiency with GPU computing (CUDA) and large‑scale training pipelines.
Background in research, demonstrated by publications or contributions to open‑source AI projects.
Ability to work independently, solve complex problems, and communicate results effectively.

Skills

pythonpytorchtensorflowcomputer visiondeep learningcuda

CompanyJobgether

DepartmentEngineering

LocationDeutschland, Germany

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026