remote

GPU Software Engineer CUDA - Bright Vision Technologies

Software Engineer

Develop high‑performance GPU‑accelerated applications using CUDA and C++ on Linux, optimizing parallel algorithms and computational pipelines to enable scalable, secure automation solutions for enterprise customers.

About the role

GPU Software Engineer (CUDA)

Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases.
Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers.
Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance.
Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines.
Develop benchmarks and regression tests to safeguard performance over time.
Evaluate new GPU architectures and feature sets, and advise on adoption strategy.
Contribute to compiler-level optimizations for tensor programs where appropriate, working at the boundary between ML frameworks and underlying accelerator codegen to unlock performance not reachable through framework-level tuning alone.
Optimize memory hierarchy usage across HBM, L2, shared memory, and registers.
Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity within bounds acceptable for the target workloads.
Document performance characteristics, design decisions, and tuning playbooks for internal teams.
Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies.
Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
Six or more years of experience in GPU programming and performance engineering.
Deep expertise in CUDA C/C++ and GPU programming models.
Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
Hands-on experience profiling and optimizing GPU workloads in production.
Familiarity with NCCL, MPI, and high-performance interconnect technologies.
Experience integrating custom kernels into ML frameworks.
Strong C++ skills and familiarity with modern systems programming practices.
Solid grounding in linear algebra and numerical methods.
Strong communication and collaboration skills with research and engineering teams.
Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks.
Familiarity with TensorRT, FasterTransformer, or vLLM internals.
Exposure to compiler infrastructure such as LLVM or MLIR.
Open-source contributions to GPU or ML performance libraries.
Experience with large-scale distributed training infrastructure.

Equal Employment Opportunity (EEO) Statement

Bright Vision Technologies (BV Teck) is committed to equal

About the role

GPU Software Engineer (CUDA)

Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases.
Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers.
Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance.
Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines.
Develop benchmarks and regression tests to safeguard performance over time.
Evaluate new GPU architectures and feature sets, and advise on adoption strategy.
Contribute to compiler-level optimizations for tensor programs where appropriate, working at the boundary between ML frameworks and underlying accelerator codegen to unlock performance not reachable through framework-level tuning alone.
Optimize memory hierarchy usage across HBM, L2, shared memory, and registers.
Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity within bounds acceptable for the target workloads.
Document performance characteristics, design decisions, and tuning playbooks for internal teams.
Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies.
Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
Six or more years of experience in GPU programming and performance engineering.
Deep expertise in CUDA C/C++ and GPU programming models.
Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
Hands-on experience profiling and optimizing GPU workloads in production.
Familiarity with NCCL, MPI, and high-performance interconnect technologies.
Experience integrating custom kernels into ML frameworks.
Strong C++ skills and familiarity with modern systems programming practices.
Solid grounding in linear algebra and numerical methods.
Strong communication and collaboration skills with research and engineering teams.
Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks.
Familiarity with TensorRT, FasterTransformer, or vLLM internals.
Exposure to compiler infrastructure such as LLVM or MLIR.
Open-source contributions to GPU or ML performance libraries.
Experience with large-scale distributed training infrastructure.

Equal Employment Opportunity (EEO) Statement

Bright Vision Technologies (BV Teck) is committed to equal

GPU Software Engineer CUDA - Bright Vision Technologies

About the role

GPU Software Engineer CUDA - Bright Vision Technologies

About the role

Skills