REHAN ALAM

Software Development Engineer

https://www.opentalent.in/rehan-alam

Key Strengths

Exceptional depth in GPU programming (CUDA, HIP/ROCm, OpenCL) and parallel computing models, highly relevant for high-performance computing (HPC) and AI/ML roles.
Proven ability to optimize deep learning inference using ONNX Runtime and TensorRT, achieving significant latency reductions (35-40%) and model size reduction (60%).
Strong practical experience with performance profiling tools (NVIDIA Nsight, rocprof) and memory access pattern tuning for GPU kernels.
Proficiency in porting CPU-bound ML preprocessing pipelines to GPU-accelerated implementations, demonstrating a deep understanding of host-device interactions.
Hands-on experience with MLOps practices, including Docker, FastAPI, Terraform, Airflow, and PySpark for data pipelines and microservice deployment.
Demonstrated strong problem-solving skills through competitive programming achievements (1200+ DSA problems, top ranks in contests).
Excellent alignment with a Software Development Engineer role, particularly one focused on high-performance computing, AI/ML integration, or system optimization.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate exhibits a strong drive for technical excellence and continuous learning, evidenced by successful personal projects and competitive programming achievements. Their focus on delivering high-performance, optimized solutions aligns with a results-oriented culture. The experience in training and collaboration suggests a team-player mindset, making them a good cultural fit for an innovative and technically demanding environment.

Soft Skills & Operational Fit

The candidate demonstrates strong problem-solving aptitude and a proactive approach to technical challenges. Experience in collaborating with system architects and ML engineers, along with authoring performance runbooks and training other engineers, indicates good teamwork and mentorship potential. The psychometric test score (73%) suggests a reasonable work attitude and ability to handle stress, contributing positively to operational fit.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

GPU Programming Engineer with 2+ years of hands-on experience developing, optimizing, and deploying GPU-accelerated solutions for deep learning and HPC workloads. Proficient in CUDA, HIP/ROCm, and OpenCL with strong command of parallel computing models, memory hierarchy optimization, and kernel performance tuning. Experienced in porting CPU-based workloads to GPU platforms, reducing inference latency by up to 40% using ONNX Runtime and TensorRT. Skilled in C/C++ and Python on Linux-based environments, with exposure to profiling tools including NVIDIA Nsight and rocprof. Comfortable working across architecture, ML, and systems teams to ship high-performance compute solutions at scale.

Top Skills

CudaConfluencePostmanAirflow

Projects

VulnHunter Distributed Parallel Scan Orchestration Platform

January 1, 2022 – January 1, 2022

Architected a distributed, parallelised security validation platform with a FastAPI control plane; designed a lease-based job scheduler (SQLAlchemy + SQLite) using idempotent worker execution and heartbeat monitoring — directly analogous to GPU task-queue and kernel concurrency management. Built modular Python + C/httpx workers independently deployable as containers on Linux; implemented structured audit logging and reproducible scan evidence mirroring the traceability standards of HPC performance benchmarking.

View Project

Early Chronic Disease Detection GPU-Accelerated ML Inference Platform

January 1, 2022 – January 1, 2022

Accelerated model inference with ONNX Runtime (CUDA EP) and TensorRT; profiled with NVIDIA Nsight to identify memory-bound kernels, tuned warp occupancy and shared memory bank access achieving 35-40% latency reduction vs. CPU-only baseline. Applied post-training quantization (INT8/FP16) and neural network operator fusion to reduce model size by 60% without accuracy loss; containerised the service as a Dockerised FastAPI image deployable on Linux-based GPU nodes. Built full ML pipeline feature engineering, SMOTE, SHAP explainability — with clean C-extension hooks for latency-critical preprocessing paths, simulating the CPU-to-GPU porting challenge common in HPC healthcare deployments.

View Project

Key Strengths

Exceptional depth in GPU programming (CUDA, HIP/ROCm, OpenCL) and parallel computing models, highly relevant for high-performance computing (HPC) and AI/ML roles.
Proven ability to optimize deep learning inference using ONNX Runtime and TensorRT, achieving significant latency reductions (35-40%) and model size reduction (60%).
Strong practical experience with performance profiling tools (NVIDIA Nsight, rocprof) and memory access pattern tuning for GPU kernels.
Proficiency in porting CPU-bound ML preprocessing pipelines to GPU-accelerated implementations, demonstrating a deep understanding of host-device interactions.
Hands-on experience with MLOps practices, including Docker, FastAPI, Terraform, Airflow, and PySpark for data pipelines and microservice deployment.
Demonstrated strong problem-solving skills through competitive programming achievements (1200+ DSA problems, top ranks in contests).
Excellent alignment with a Software Development Engineer role, particularly one focused on high-performance computing, AI/ML integration, or system optimization.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

REHAN ALAM

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Key Strengths

Cultural & Operational Fit