remote
AI Performance Optimization Engineer - BV Teck
Software Engineer
AI Performance Optimization Engineer focused on accelerating deep learning workloads using Python, TensorFlow, PyTorch, and CUDA, while profiling and tuning models for production scalability.
About the role
Key Responsibilities
- Analyze and benchmark AI models to identify performance bottlenecks across CPU, GPU, and distributed environments.
- Implement optimizations using TensorFlow, PyTorch, and CUDA, including mixed‑precision training, kernel fusion, and memory management.
- Develop automated profiling pipelines and dashboards to monitor inference latency, throughput, and resource utilization.
- Collaborate with data scientists and software engineers to integrate optimized models into production pipelines.
- Document best practices, create technical guides, and conduct knowledge‑sharing sessions on AI performance engineering.
Requirements
- Strong programming skills in Python and experience with TensorFlow or PyTorch.
- Hands‑on experience with CUDA, cuDNN, and GPU profiling tools.
- Proficiency in performance profiling, profiling tools (e.g., Nsight, nvprof, TensorBoard), and optimization techniques.
- Solid understanding of distributed training, model quantization, and inference optimization.
- Excellent problem‑solving skills and ability to communicate complex technical concepts clearly.
Skills
pythonmachine learningtensorflowpytorchcuda