About the Role:
We are seeking a driven NLP Engineer who can help scale, optimize, and deploy large language model (LLM)-based solutions within the healthcare domain. The primary focus of this role is on building and maintaining production-grade, end-to-end NLP systems—including backend architecture design, inference optimization, and efficient model deployment pipelines. While there will be opportunities to train or fine-tune LLMs for specific use cases, your core responsibility is to ensure that these models run at scale, efficiently, and reliably in production environments. In addition to working with cutting-edge LLMs, you will also build and maintain NLP pipelines utilizing already-trained LLMs and embedding models. This includes constructing retrieval-augmented generation (RAG) systems and agentic systems that integrate multiple models and data sources to deliver robust, real-time NLP functionalities.
What We Expect You to Bring (These are essentials!):
- Bachelor's or Master's degree in Computer Science or related field.
- 2 years of professional experience (or 1+ year with an advanced degree) in building and deploying ML/NLP systems using Python.
- Proficiency in working with NLP frameworks (e.g., spaCy, HuggingFace Transformers, LangChain, etc), deep learning libraries (e.g., PyTorch), and common data preprocessing techniques.
- Practical experience in designing, implementing, and maintaining robust, scalable backend infrastructures for NLP and LLM-based applications.
- Strong knowledge of containerization and version control for building reliable, production-grade systems.
- Experience with large datasets: data cleaning, preprocessing, and structuring.
- Hands-on experience optimizing LLM inference performance using frameworks like vLLM, TensorRT, Ray, etc.
- Experience deploying NLP models in production environments, including load balancing and latency reduction.
We Definitely Want You If You Have:
- Familiarity with building retrieval-augmented generation (RAG) pipelines and integrating embedding models into NLP workflows.
- Exposure to agentic systems that combine multiple models or tools for more dynamic, context-aware NLP solutions.
- Understanding of prompt engineering, model fine-tuning, and large-scale inference optimization for LLMs.
What You Will Be Doing:
Production-Grade NLP Systems:
- Design and implement scalable, efficient NLP pipelines leveraging already-trained LLMs and embedding models.
- Integrate RAG and agentic components to enhance the capabilities and adaptability of NLP systems.
Inference Optimization & Deployment:
- Optimize model inference performance, reduce latency, and improve throughput using techniques and frameworks designed for large-scale LLM deployments.
- Implement best practices for containerization, CI/CD, monitoring, and observability to ensure rapid, reliable deployments.
Occasional Model Adaptation:
- As needed, assist with fine-tuning or adapting LLMs to specific healthcare use cases, while maintaining a focus on long-term scalability and performance.
Collaboration & Continuous Improvement:
- Work closely with cross-functional teams—including NLP researchers, backend engineers, product managers, and front-end developers—to deliver high-quality NLP solutions.
- Participate in code reviews, contribute to architectural discussions, and remain current on emerging NLP and LLM optimization techniques.