remote
Inference Engineer - Deepinfra Inc.
Inference Engineer
Inference Engineer to optimize and deploy high-performance AI models, focusing on low-latency systems and hardware acceleration.
About the role
Key Responsibilities
- Optimize and deploy ML models for high-performance inference at scale
- Develop low-latency systems for real-time AI applications
- Implement quantization, pruning, and other optimization techniques
- Collaborate with hardware teams to maximize hardware utilization
- Benchmark and profile inference performance across different platforms
- Ensure reliability and efficiency of production inference pipelines
Requirements
- 3+ years in systems programming or ML inference optimization
- Expertise in C++ and Python for performance-critical applications
- Experience with GPU computing and CUDA programming
- Knowledge of model optimization techniques and hardware acceleration
- Strong debugging and profiling skills for performance tuning
Skills
cpythongpu computingmodel optimizationcudainference systems