onsite
Inference Engineer
Inference Engineer
Cartesia is seeking an Inference Engineer to design and build low-latency, scalable, and reliable model inference and serving stacks for their cutting-edge foundation models. This role involves close collaboration with research and product teams to deliver fast, cost-effective AI solutions and build robust inference infrastructure.
About the role
About the Role
We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence.
Your Impact
- Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models.
- Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner.
- Design and build robust inference infrastructure and monitoring for our products.
- Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications.
What You Bring
Given the scale and difficulty of problems we work on, we value strong engineering skills at Cartesia.
- Strong engineering skills, comfortable navigating complex codebases and an eye for writing clean and maintainable code.
- Experience building large-scale distributed systems with high demands on performance, reliability, and observability.
- Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity.
- Background in or experience working on inference pipelines with machine learning and generative models.
- Experience implementing state of the art Machine Learning models and research to applied problems.
- Preferable: experience with vLLM, SGLang, Continuous Batching or other inference frameworks.
- Preferable: experience working in CUDA, Triton or similar.
Skills
TransformersSSMsVllmSGLangContinuous BatchingCudaTritonMachine Learninggenerative modelsDistributed Systems