For Engineers -- ML Engineering

Build the systems that make frontier AI possible.

Training platforms, inference stacks, GPU schedulers, data pipelines, eval infrastructure. ML engineering at frontier labs in 2026 -- what the work actually looks like, what it pays, and the eight specializations the labs are hiring for.

Apply to the network Browse specializations

8 SPECIALIZATIONS tracked

$620K--$1.35M total comp band (sr. IC)

12 FRONTIER LABS in our network

UPDATED MAY 2026

// Where ML engineering sits

Different rubric. Same bar.

Research figures out what to train. ML engineering figures out how. Senior IC bands at frontier labs are comparable across the two -- sometimes higher on the engineering side for the right specialist -- but the work, the panels, and the daily problems are not the same.

AI ResearchPAPER-DRIVEN

Picks the question, designs the method, runs the experiment, writes it up. Grades on research taste, methodological rigor, and shipped research.

If you'd rather argue about reward shaping than about GPU memory layout, this is your column. See the research roles page.

ML EngineeringSYSTEMS-DRIVEN

Builds the platform that makes the research possible. Trains the model that the researcher specified. Serves it at production scale. Grades on systems intuition, on-call ownership, and verified scale numbers.

If you'd rather profile a 1,024-GPU training run than write a related-work section, you're in the right place.

// Eight engineering specializations

ML engineering has fractured into specializations too.

"ML engineer" used to mean one thing. In 2026, the frontier labs hire against eight distinct engineering tracks, each with its own panel, its own rubric, and its own comp profile.

// SPEC 01

Training platform engineering

The internal stack researchers use to launch training runs. Parallelism, fault tolerance, checkpointing, telemetry. The biggest single engineering org at most frontier labs.

FSDP3D parallelcheckpointingPyTorchCUDA

// COMP -- SR IC$700K--$1.35M

// SPEC 02

Inference engineering

Serving frontier models in production. Paged attention, speculative decoding, batching strategies, throughput-vs-latency trade-offs. Comp has moved up sharply as inference became the cost center.

vLLMpaged attnspec decodebatchingCUDA

// COMP -- SR IC$680K--$1.2M

// SPEC 03

GPU scheduling & cluster ops

Squeezing utilization out of frontier-scale compute. Kueue, Ray, Slurm, custom schedulers. Owns the SLA between research and the cluster.

KueueRaySlurmk8sobservability

// COMP -- SR IC$620K--$1.05M

// SPEC 04

Data pipeline engineering

Ingestion, dedup, sharding, streaming, mid-training data fixes. The team that decides whether your 2T-token run is delayed by data plumbing.

streamingdedupshardingtokenizationarrow

// COMP -- SR IC$640K--$1.0M

// SPEC 05

Eval & observability infra

The harnesses that catch regressions before they ship. Online evals, offline benchmarks, A/B infrastructure, model behaviour monitoring in production.

eval harnessbenchmarksA/Bmonitoring

// COMP -- SR IC$640K--$1.05M

// SPEC 06

ML platform & DevEx

The internal tools researchers and engineers actually use day-to-day. Experiment tracking, hyperparameter search, sweeps, internal notebooks. Quietly load-bearing.

experiment trackingnotebookssweepsinternal tools

// COMP -- SR IC$620K--$980K

// SPEC 07

Model serving & quantization

Specialized serving -- on-device, edge, latency-critical environments. Quantization, distillation, model surgery for production constraints.

quantizationdistillationon-deviceTensorRT

// COMP -- SR IC$650K--$1.05M

// SPEC 08

AI SRE / reliability

Training reliability, inference SLOs, on-call for AI workloads. Newer specialization -- but every frontier lab now has one, and the bar is being calibrated upward.

on-callSLOsincident responseobservability

// COMP -- SR IC$620K--$990K

// MAPPED TO YOUR PROFILE

Cohire Copilot maps you to the right specialization.

Open Cohire. It reads your real shipped work -- training-platform commits, inference benchmarks, on-call records -- and plots you against all eight engineering specializations. Honest map; the recommendation often isn't what you'd guess.

// FEATURE -- Open Cohire Copilot

// What frontier labs grade on

Six things ML engineering panels actually score.

The rubric is different from research -- and consistent across the eight engineering specializations. Senior panels look for the same six signals, with the weighting depending on the role.

Verified scale numbers

Talking about "training large models" without numbers is a fast disqualifier. Panels want actual GPU counts, token counts, throughput numbers, p95 latencies -- and the reasoning behind each. Bring receipts.

Systems intuition under pressure

Senior IC panels run a 45-minute systems-design loop. The question is open-ended ("design a training platform for a 100B model run") and they grade on which trade-offs you surface, in what order, with what depth.

On-call ownership

Did you carry the pager for the system you built? Have you debugged a real production AI incident at 3 a.m.? The answer says more about you than any whiteboard problem. Production scars beat clean designs.

Profiling & performance reasoning

For inference and training-platform roles especially: can you reason about memory layout, GPU utilization, communication overhead? Panels will probe the parts of the stack most engineers handwave.

Working with researchers

The job is half engineering, half being the right partner to research. Panels grade how you talk about disagreement with a researcher -- when to push back, when to ship the unergonomic API, when to escalate.

Read-the-code rigor

Strong ML engineering candidates can be dropped into an unfamiliar 50K-line codebase (PyTorch internals, vLLM, FSDP) and trace a bug end-to-end. The take-home tends to test this directly.

// Compensation benchmarks

Senior ML engineering IC comp -- May 2026.

Total compensation (base + equity + bonus, annualized) for senior IC engineering offers across frontier labs. US-based. Sourced from network-verified offers.

Median total comp by engineering track -- USD

Senior IC with 5-8 years experience. Staff and principal levels are 1.4-2.0x the senior IC band.

SAMPLE: 1,420 ML ENG OFFERSJAN-APR 2026

Engineering track	Range	Median	YoY
Training platform engineering	$700K -- $1.35M	$920K	+12%
Inference engineering	$680K -- $1.2M	$870K	+26%
GPU scheduling & cluster ops	$620K -- $1.05M	$780K	+9%
Data pipeline engineering	$640K -- $1.0M	$760K	+8%
Eval & observability infra	$640K -- $1.05M	$790K	+19%
ML platform & DevEx	$620K -- $980K	$730K	+7%
Model serving & quantization	$650K -- $1.05M	$800K	+15%
AI SRE / reliability	$620K -- $990K	$740K	+21%

// Sample ML eng roles in network this week

What's on the table right now.

A representative slice of ML engineering roles currently in the OpenTalent network -- quiet listings and public ones. Network members see the full set with match scores against their profile.

Anthropic

SF -- QUIET

93% fit

Staff Engineer -- Training Platform

Owns 3D parallelism, fault tolerance, and checkpointing for the next-generation Claude training stack. Heavy systems-design loop.

training platformFSDPCUDA

$1.05M -- $1.35MView role

OpenAI

91% fit

Sr. Inference Engineer -- Production

Paged attention, speculative decoding, and batching for the API serving stack. The team that owns p95 latency and cost per token.

inferencevLLMCUDA

$880K -- $1.15MView role

Google DeepMind

LONDON

88% fit

Engineer III -- GPU Scheduling & Cluster Ops

Owns scheduler utilization across the frontier training cluster. Kueue-based stack, multi-tenant fairness, on-call rotation.

schedulingKueuek8s

£560K -- £760KView role

xAI

SF -- BAY AREA

85% fit

Sr. Engineer -- Data Pipeline (Pre-training)

Ingestion, dedup, sharding for a multi-trillion-token pre-training corpus. The role behind keeping a 100K-GPU training run fed.

datastreamingarrow

$960K -- $1.2MView role

Cohere

TORONTO -- REMOTE

82% fit

Eval Infra Engineer -- Senior IC

Owns the eval and behaviour-monitoring stack across model releases. The team that catches the regression before customers do.

evalsobservabilityA/B

CA$880K -- CA$1.1MView role

Mistral

PARIS

79% fit

Sr. Engineer -- Inference (Quantization)

Quantization, distillation, and on-device-ready model serving for open-weights releases. Paris hybrid.

quantizationon-deviceopen-weights

€480K -- €640KView role

// By the numbers

Where the network sits on ML engineering right now.

11,400+

ML engineering-track members in the OpenTalent network.

// DEPTH

480

ML engineering roles in the network this quarter -- 60% of them quiet listings.

// ROLES

~5w

Median frontier-lab ML engineering loop, scope to written offer.

// LOOP TIME

+26%

YoY median comp lift for inference engineering roles -- fastest-rising specialization.

// COMP DELTA

// FAQ

Questions ML engineers ask first.

Do frontier labs prefer ML researchers over ML engineers?+

No. At every frontier lab in our network, ML engineers are a critical and well-respected hire -- and senior IC bands are comparable to research. The day-to-day work differs, the panels differ, and the rubric differs, but the ladder, the influence, and the comp are not.

The fastest-rising specialization right now is inference engineering (+26% YoY) -- frontier labs are paying premium for the people who can serve their models efficiently at scale.

I've been doing "ML infra" at a non-AI company. How transferable is that?+

Very, if you can talk concretely about scale, profiling, and on-call. The gap is usually less about raw skills and more about frontier-specific context -- what a real RLHF training run looks like, what production inference at frontier scale demands, the partial-stack reading required to debug a real PyTorch issue.

Cohire Copilot will tell you exactly which gap is yours.

Do I need a PhD for ML engineering roles?+

No. Across the network's last 12 months of ML engineering placements, fewer than 15% of placed engineers had a PhD. What every successful candidate had was shipped systems -- training platforms they built, inference stacks they owned, on-call rotations they carried, performance work with measurable wins.

Can I move into ML engineering from regular distributed systems work?+

Yes, and it's one of the most common moves in our network. The translation is real: a strong distributed systems engineer who has spent six months reading PyTorch internals, profiling a real training run, and shipping a small inference repo will pass most frontier-lab loops.

Cohire Copilot will surface exactly which translation gap to close.

How "quiet" are the quiet listings for ML engineering?+

Quieter than research. 60% of senior IC ML engineering roles in our network this quarter weren't on public careers pages -- frontier labs hire ML engineers heavily through referrals and curated networks. Members of OpenTalent see the full set.

Is it really free?+

Free for OpenTalent network members. The hiring lab pays the placement fee -- never you. To join the network, apply through the five-stage screening.

Build the systems that train the frontier.

Apply to OpenTalent. Less than 3% of applicants make it. The ones who do see the ML engineering roles, comp, and prep that the broader market doesn't.

Apply to the network See the bar

Build the systems that make frontier AI possible.

8 SPECIALIZATIONS tracked

$620K--$1.35M total comp band (sr. IC)

12 FRONTIER LABS in our network

UPDATED MAY 2026

Engineering track

Range

Median

YoY

Training platform engineering

$700K -- $1.35M

$920K

+12%

Inference engineering

$680K -- $1.2M

$870K

+26%

GPU scheduling & cluster ops

$620K -- $1.05M

$780K

+9%

Data pipeline engineering

$640K -- $1.0M

$760K

+8%

Eval & observability infra

$640K -- $1.05M

$790K

+19%

ML platform & DevEx

$620K -- $980K

$730K

+7%

Model serving & quantization

$650K -- $1.05M

$800K

+15%

AI SRE / reliability

$620K -- $990K

$740K

+21%

Build the systems that make frontier AI possible.

Different rubric. Same bar.

AI ResearchPAPER-DRIVEN

ML EngineeringSYSTEMS-DRIVEN

ML engineering has fractured into specializations too.

Training platform engineering

Inference engineering

GPU scheduling & cluster ops

Data pipeline engineering

Eval & observability infra

ML platform & DevEx

Model serving & quantization

AI SRE / reliability

Cohire Copilot maps you to the right specialization.

Six things ML engineering panels actually score.

Verified scale numbers

Systems intuition under pressure

On-call ownership

Profiling & performance reasoning

Working with researchers

Read-the-code rigor

Senior ML engineering IC comp -- May 2026.

Median total comp by engineering track -- USD

What's on the table right now.

Staff Engineer -- Training Platform

Sr. Inference Engineer -- Production

Engineer III -- GPU Scheduling & Cluster Ops

Sr. Engineer -- Data Pipeline (Pre-training)

Eval Infra Engineer -- Senior IC

Sr. Engineer -- Inference (Quantization)

From "ML engineer" to "frontier-lab ML engineer."

Map your position

Close the gaps

See the matches

Run the loop

Where the network sits on ML engineering right now.

Questions ML engineers ask first.

If ML engineering isn't your column.

AI Research roles

Applied AI roles

Early-career track

Build the systems that train the frontier.

Build the systems that make frontier AI possible.

Different rubric. Same bar.

AI ResearchPAPER-DRIVEN

ML EngineeringSYSTEMS-DRIVEN

ML engineering has fractured into specializations too.

Training platform engineering

Inference engineering

GPU scheduling & cluster ops

Data pipeline engineering

Eval & observability infra

ML platform & DevEx

Model serving & quantization

AI SRE / reliability

Cohire Copilot maps you to the right specialization.

Six things ML engineering panels actually score.

Verified scale numbers

Systems intuition under pressure

On-call ownership

Profiling & performance reasoning

Working with researchers

Read-the-code rigor

Senior ML engineering IC comp -- May 2026.

Median total comp by engineering track -- USD

What's on the table right now.

Staff Engineer -- Training Platform

Sr. Inference Engineer -- Production

Engineer III -- GPU Scheduling & Cluster Ops

Sr. Engineer -- Data Pipeline (Pre-training)

Eval Infra Engineer -- Senior IC

Sr. Engineer -- Inference (Quantization)

From "ML engineer" to "frontier-lab ML engineer."

Map your position

Close the gaps

See the matches

Run the loop

Where the network sits on ML engineering right now.

Questions ML engineers ask first.

If ML engineering isn't your column.