Software Engineer, Inference - Performance Optimization
As a Software Engineer focusing on Inference Performance Optimization, you will be responsible for modeling inference performance across various layers to identify bottlenecks and improve efficiency. This role involves building and refining performance models, analyzing inference workloads, and enhancing tooling to optimize latency and throughput.
Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches.
In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross-functional teams reason about latency, capacity, utilization, and cost tradeoffs.
Posted June 7, 2026