hybrid

Engineering Manager, Model Inference

Abridge is seeking an Engineering Manager to lead and grow its Model Inference team. This role involves owning the technical direction and scaling of inference systems for AI-powered products, ensuring low-latency and high-throughput infrastructure. The manager will lead a team of AI inference engineers, partner with ML Research, and ensure peak efficiency and reliability of systems powering clinician interactions.

About the role

About Abridge

Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.

Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems.

We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense. We have offices located in the Mission District in San Francisco, the SoHo neighborhood of New York, and East Liberty in Pittsburgh.

The Role

Our generative AI-powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team.

The Inference team owns the end-to-end technical direction of how our models are served: from architecting low-latency, high-throughput infrastructure to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers, partner closely with ML Research and the broader AI Platform, and ensure the systems underpinning every clinician interaction are operating at peak efficiency and reliability.

What You’ll Do

Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization
Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response
Benchmark and eliminate bottlenecks throughout the inference stack
Partner with ML Research teams on model optimization, quantization, and deployment
Develop APIs for AI inference used by both internal teams and external customers
Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients

What You’ll Bring

5+ years of engineering experience with 1+ years in a technical leadership or management role
Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
Strong understanding of LLM architecture (eg. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
Experience with inference optimizations (eg. batching, quantization, kernel fusion, FlashAttention)
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers
Strong technical communication and cross-functional collaboration skills
Comfortable giving constructive feedback on technical designs and code reviews
Has thrived in a fast-growing startup and knows how to operate with urgency and focus

Added Bonus

Background in training infrastructure and RL workloads
Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome)
Experience with Kubernetes and container orchestration at scale
Published work or contributions to inference optimization research

About the role

About Abridge

The Role

What You’ll Do

Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization
Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response
Benchmark and eliminate bottlenecks throughout the inference stack
Partner with ML Research teams on model optimization, quantization, and deployment
Develop APIs for AI inference used by both internal teams and external customers
Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients

What You’ll Bring

5+ years of engineering experience with 1+ years in a technical leadership or management role
Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
Strong understanding of LLM architecture (eg. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
Experience with inference optimizations (eg. batching, quantization, kernel fusion, FlashAttention)
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers
Strong technical communication and cross-functional collaboration skills
Comfortable giving constructive feedback on technical designs and code reviews
Has thrived in a fast-growing startup and knows how to operate with urgency and focus

Added Bonus

Background in training infrastructure and RL workloads
Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome)
Experience with Kubernetes and container orchestration at scale
Published work or contributions to inference optimization research

Engineering Manager, Model Inference

About the role

About Abridge

The Role

What You’ll Do

What You’ll Bring

Added Bonus

Engineering Manager, Model Inference

About the role

About Abridge

The Role

What You’ll Do

What You’ll Bring

Added Bonus

Skills