remote
AI Inference Engineer - DeepInfra
AI Engineer
AI Inference Engineer to design and optimize high-performance inference systems for scalable AI deployments across diverse hardware platforms.
About the role
Key Responsibilities
- Develop high-performance inference engines for AI models across diverse hardware platforms
- Optimize model architectures for low-latency and high-throughput inference
- Implement quantization, pruning, and other optimization techniques
- Collaborate with hardware teams to leverage GPU/TPU acceleration
- Design benchmarking frameworks to evaluate inference performance
- Ensure cross-platform compatibility and scalability of inference solutions
Requirements
- 5+ years of experience in AI inference or related fields
- Expertise in Python and C++ with GPU programming experience
- Strong understanding of model optimization techniques
- Experience with CUDA, OpenCL, or similar acceleration frameworks
- Familiarity with AI frameworks (PyTorch, TensorFlow) and deployment tools
Skills
pythoncgpu programmingmodel optimizationinference systemscuda