onsite
Senior Software Engineer, Inference - Pika
Software Engineer
Senior engineer focused on accelerating AI inference pipelines using GPU parallelism, CUDA, and TensorRT to boost performance of video generation models and large‑scale AI products.
About the role
Key Responsibilities
- Design, implement, and maintain high‑performance inference pipelines for AI‑driven video generation and other creative applications.
- Apply advanced GPU parallelism techniques, including CUDA kernel optimization and TensorRT integration, to reduce latency and increase throughput.
- Collaborate with research scientists to translate cutting‑edge models into production‑ready, scalable deployments.
- Profile and benchmark models, identify bottlenecks, and apply quantization, pruning, and other optimization strategies.
- Develop tooling and automation for continuous performance testing and deployment across multi‑GPU and distributed environments.
Requirements
- 5+ years of software engineering experience with a strong focus on GPU programming (CUDA) and deep learning frameworks such as PyTorch.
- Proven expertise in inference acceleration tools like TensorRT, ONNX Runtime, or custom kernel development.
- Hands‑on experience optimizing large models for real‑time video generation or similar high‑throughput workloads.
- Solid understanding of performance profiling, memory management, and parallel algorithm design.
- Ability to work cross‑functionally with researchers, product engineers, and infrastructure teams to deliver production‑grade AI solutions.