onsite

NLP Engineer

The NLP Engineer will focus on scaling, optimizing, and deploying LLM-based solutions in healthcare, specifically building and maintaining production-grade, end-to-end NLP systems. This includes backend architecture, inference optimization, and efficient model deployment pipelines, as well as constructing RAG and agentic systems for robust, real-time NLP functionalities.

About the role

About the Role:

We are seeking a driven NLP Engineer who can help scale, optimize, and deploy large language model (LLM)-based solutions within the healthcare domain. The primary focus of this role is on building and maintaining production-grade, end-to-end NLP systems—including backend architecture design, inference optimization, and efficient model deployment pipelines. While there will be opportunities to train or fine-tune LLMs for specific use cases, your core responsibility is to ensure that these models run at scale, efficiently, and reliably in production environments. In addition to working with cutting-edge LLMs, you will also build and maintain NLP pipelines utilizing already-trained LLMs and embedding models. This includes constructing retrieval-augmented generation (RAG) systems and agentic systems that integrate multiple models and data sources to deliver robust, real-time NLP functionalities.

What We Expect You to Bring (These are essentials!):

Bachelor's or Master's degree in Computer Science or related field.
2 years of professional experience (or 1+ year with an advanced degree) in building and deploying ML/NLP systems using Python.
Proficiency in working with NLP frameworks (e.g., spaCy, HuggingFace Transformers, LangChain, etc), deep learning libraries (e.g., PyTorch), and common data preprocessing techniques.
Practical experience in designing, implementing, and maintaining robust, scalable backend infrastructures for NLP and LLM-based applications.
Strong knowledge of containerization and version control for building reliable, production-grade systems.
Experience with large datasets: data cleaning, preprocessing, and structuring.
Hands-on experience optimizing LLM inference performance using frameworks like vLLM, TensorRT, Ray, etc.
Experience deploying NLP models in production environments, including load balancing and latency reduction.

We Definitely Want You If You Have:

Familiarity with building retrieval-augmented generation (RAG) pipelines and integrating embedding models into NLP workflows.
Exposure to agentic systems that combine multiple models or tools for more dynamic, context-aware NLP solutions.
Understanding of prompt engineering, model fine-tuning, and large-scale inference optimization for LLMs.

What You Will Be Doing:

Production-Grade NLP Systems:

Design and implement scalable, efficient NLP pipelines leveraging already-trained LLMs and embedding models.
Integrate RAG and agentic components to enhance the capabilities and adaptability of NLP systems.

Inference Optimization & Deployment:

Optimize model inference performance, reduce latency, and improve throughput using techniques and frameworks designed for large-scale LLM deployments.
Implement best practices for containerization, CI/CD, monitoring, and observability to ensure rapid, reliable deployments.

Occasional Model Adaptation:

As needed, assist with fine-tuning or adapting LLMs to specific healthcare use cases, while maintaining a focus on long-term scalability and performance.

Collaboration & Continuous Improvement:

Work closely with cross-functional teams—including NLP researchers, backend engineers, product managers, and front-end developers—to deliver high-quality NLP solutions.
Participate in code reviews, contribute to architectural discussions, and remain current on emerging NLP and LLM optimization techniques.

About the role

About the Role:

What We Expect You to Bring (These are essentials!):

Bachelor's or Master's degree in Computer Science or related field.
2 years of professional experience (or 1+ year with an advanced degree) in building and deploying ML/NLP systems using Python.
Proficiency in working with NLP frameworks (e.g., spaCy, HuggingFace Transformers, LangChain, etc), deep learning libraries (e.g., PyTorch), and common data preprocessing techniques.
Practical experience in designing, implementing, and maintaining robust, scalable backend infrastructures for NLP and LLM-based applications.
Strong knowledge of containerization and version control for building reliable, production-grade systems.
Experience with large datasets: data cleaning, preprocessing, and structuring.
Hands-on experience optimizing LLM inference performance using frameworks like vLLM, TensorRT, Ray, etc.
Experience deploying NLP models in production environments, including load balancing and latency reduction.

We Definitely Want You If You Have:

Familiarity with building retrieval-augmented generation (RAG) pipelines and integrating embedding models into NLP workflows.
Exposure to agentic systems that combine multiple models or tools for more dynamic, context-aware NLP solutions.
Understanding of prompt engineering, model fine-tuning, and large-scale inference optimization for LLMs.

What You Will Be Doing:

Production-Grade NLP Systems:

Design and implement scalable, efficient NLP pipelines leveraging already-trained LLMs and embedding models.
Integrate RAG and agentic components to enhance the capabilities and adaptability of NLP systems.

Inference Optimization & Deployment:

Optimize model inference performance, reduce latency, and improve throughput using techniques and frameworks designed for large-scale LLM deployments.
Implement best practices for containerization, CI/CD, monitoring, and observability to ensure rapid, reliable deployments.

Occasional Model Adaptation:

As needed, assist with fine-tuning or adapting LLMs to specific healthcare use cases, while maintaining a focus on long-term scalability and performance.

Collaboration & Continuous Improvement:

Work closely with cross-functional teams—including NLP researchers, backend engineers, product managers, and front-end developers—to deliver high-quality NLP solutions.
Participate in code reviews, contribute to architectural discussions, and remain current on emerging NLP and LLM optimization techniques.

NLP Engineer

About the role

About the Role:

What We Expect You to Bring (These are essentials!):

We Definitely Want You If You Have:

What You Will Be Doing:

Production-Grade NLP Systems:

Inference Optimization & Deployment:

Occasional Model Adaptation:

Collaboration & Continuous Improvement:

NLP Engineer

About the role

About the Role:

What We Expect You to Bring (These are essentials!):

We Definitely Want You If You Have:

What You Will Be Doing:

Production-Grade NLP Systems:

Inference Optimization & Deployment:

Occasional Model Adaptation:

Collaboration & Continuous Improvement:

Skills