hybrid

Applied Research Scientist - Foundation Models

Ambient.ai is seeking an Applied Research Scientist to develop the next generation of foundation models for computer vision, specifically focusing on multimodal models for physical security. This role involves full-cycle model development, from pre-training and fine-tuning vision-language models to applying compression techniques for efficient deployment. The scientist will optimize transformer-based models, manage training pipelines, and collaborate with engineering and product teams to integrate models into the platform.

About the role

About the role:

Ambient.ai is hiring an Applied Research Scientist to build the next generation of foundation models for computer vision. You will join a team responsible for building multimodal models with state-of-the-art performance on real-world vision benchmarks. In this role, you’ll own full-cycle model development: from pre-training and fine-tuning on image-language data to applying distillation and compression techniques for deployment. This is a hands-on, cross-functional role where your work will directly impact our mission of preventing every security incident possible.

What you'll do:

Develop & Optimize VLMs: Design and optimize transformer-based vision-language models to understand images, videos, and text, and optimize for real-time inference.
Pre-training & Fine-tuning: Own the full training pipeline—from pre-training on image-text data to fine-tuning for Ambient.ai’s physical security domain and use cases.
Model Compression & Optimization: Apply techniques like distillation, quantization, and pruning to reduce model size and latency, enabling efficient edge deployment.
Leverage Open-Source & Innovate: Use and extend state-of-the-art open-source models. Prototype new architectures and training methods to advance Ambient.ai’s multimodal AI research.
Cross-Team Collaboration: Work with engineering and product teams to integrate models into the platform. Iterate based on real-world feedback and deployment data to improve performance.
Research and Experimentation: Stay current with vision, NLP, and multimodal AI research. Design experiments to test new algorithms and continually enhance our core AI systems.

What you'll bring:

Ph.D. or Master’s in CS, EE, or related field, with a strong foundation in AI/ML (Ph.D. preferred or Master’s with strong experience)
Proficient in Python/C++ and deep learning frameworks like PyTorch or TensorFlow. Comfortable with large-scale training pipelines
Hands-on experience with CNNs, Transformers, and Vision Transformers (ViT). Strong understanding of vision-language models and how to fine-tune or adapt them
Proven skills in model training and optimization, including fine-tuning on large datasets and applying distillation, quantization, or similar techniques. Experience with foundation or multimodal models is a plus.
Strong problem-solving ability: quick prototyping, diagnosing failure cases, and iterating on solutions
Startup experience preferred: Comfortable with ambiguity, fast iteration, and owning projects end-to-end

About the role

About the role:

What you'll do:

Develop & Optimize VLMs: Design and optimize transformer-based vision-language models to understand images, videos, and text, and optimize for real-time inference.
Pre-training & Fine-tuning: Own the full training pipeline—from pre-training on image-text data to fine-tuning for Ambient.ai’s physical security domain and use cases.
Model Compression & Optimization: Apply techniques like distillation, quantization, and pruning to reduce model size and latency, enabling efficient edge deployment.
Leverage Open-Source & Innovate: Use and extend state-of-the-art open-source models. Prototype new architectures and training methods to advance Ambient.ai’s multimodal AI research.
Cross-Team Collaboration: Work with engineering and product teams to integrate models into the platform. Iterate based on real-world feedback and deployment data to improve performance.
Research and Experimentation: Stay current with vision, NLP, and multimodal AI research. Design experiments to test new algorithms and continually enhance our core AI systems.

What you'll bring:

Ph.D. or Master’s in CS, EE, or related field, with a strong foundation in AI/ML (Ph.D. preferred or Master’s with strong experience)
Proficient in Python/C++ and deep learning frameworks like PyTorch or TensorFlow. Comfortable with large-scale training pipelines
Hands-on experience with CNNs, Transformers, and Vision Transformers (ViT). Strong understanding of vision-language models and how to fine-tune or adapt them
Proven skills in model training and optimization, including fine-tuning on large datasets and applying distillation, quantization, or similar techniques. Experience with foundation or multimodal models is a plus.
Strong problem-solving ability: quick prototyping, diagnosing failure cases, and iterating on solutions
Startup experience preferred: Comfortable with ambiguity, fast iteration, and owning projects end-to-end

Applied Research Scientist - Foundation Models

About the role

About the role:

What you'll do:

What you'll bring:

Applied Research Scientist - Foundation Models

About the role

About the role:

What you'll do:

What you'll bring:

Skills