onsite
AI Research Engineer Multimodal & Vision - Jobgether
Research Engineer
Research‑focused AI Engineer developing multimodal vision‑language models, handling dataset creation, training pipelines, evaluation, and optimization using Python and deep‑learning frameworks.
About the role
Key Responsibilities
- Design and implement end‑to‑end multimodal models that integrate visual and textual data.
- Build and maintain large‑scale data pipelines for image, video, and language datasets, including annotation and preprocessing.
- Develop training scripts and optimization routines using PyTorch/TensorFlow, leveraging GPU acceleration (CUDA) for efficient model convergence.
- Conduct rigorous model evaluation, benchmarking against state‑of‑the‑art metrics, and iterate to improve performance.
- Collaborate with cross‑functional research teams to publish findings and translate prototypes into production‑ready components.
Requirements
- Strong programming skills in Python and experience with deep‑learning libraries such as PyTorch or TensorFlow.
- Hands‑on experience in computer vision and multimodal learning, including model architectures like Transformers, CNNs, and CLIP‑style systems.
- Proficiency with GPU computing (CUDA) and large‑scale training pipelines.
- Background in research, demonstrated by publications or contributions to open‑source AI projects.
- Ability to work independently, solve complex problems, and communicate results effectively.
Skills
pythonpytorchtensorflowcomputer visiondeep learningcuda