
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Software Development Engineer with 1+ years in GPU-accelerated Deep Learning & HPC
GPU Programming Engineer with 2+ years of hands-on experience developing, optimizing, and deploying GPU-accelerated solutions for deep learning and HPC workloads. Proficient in CUDA, HIP/ROCm, and OpenCL with strong command of parallel computing models, memory hierarchy optimization, and kernel performance tuning. Experienced in porting CPU-based workloads to GPU platforms, reducing inference latency by up to 40% using ONNX Runtime and TensorRT. Skilled in C/C++ and Python on Linux-based environments, with exposure to profiling tools including NVIDIA Nsight and rocprof. Comfortable working across architecture, ML, and systems teams to ship high-performance compute solutions at scale.
RGPV University, Madhya Pradesh
B.Tech · Computer Science
January 1, 2018 – January 1, 2022
Meril Life Sciences
Software Development Engineer I | GPU-Accelerated Systems & AI/ML Integration
June 1, 2024 – Present
Vapi, Gujarat, India
VulnHunter Distributed Parallel Scan Orchestration Platform
January 1, 2022 – January 1, 2022
Architected a distributed, parallelised security validation platform with a FastAPI control plane; designed a lease-based job scheduler (SQLAlchemy + SQLite) using idempotent worker execution and heartbeat monitoring — directly analogous to GPU task-queue and kernel concurrency management. Built modular Python + C/httpx workers independently deployable as containers on Linux; implemented structured audit logging and reproducible scan evidence mirroring the traceability standards of HPC performance benchmarking.
View ProjectEarly Chronic Disease Detection GPU-Accelerated ML Inference Platform
January 1, 2022 – January 1, 2022
Accelerated model inference with ONNX Runtime (CUDA EP) and TensorRT; profiled with NVIDIA Nsight to identify memory-bound kernels, tuned warp occupancy and shared memory bank access achieving 35-40% latency reduction vs. CPU-only baseline. Applied post-training quantization (INT8/FP16) and neural network operator fusion to reduce model size by 60% without accuracy loss; containerised the service as a Dockerised FastAPI image deployable on Linux-based GPU nodes. Built full ML pipeline feature engineering, SMOTE, SHAP explainability — with clean C-extension hooks for latency-critical preprocessing paths, simulating the CPU-to-GPU porting challenge common in HPC healthcare deployments.
View ProjectCultural Fit Analysis
The candidate exhibits a strong drive for technical excellence and continuous learning, evidenced by successful personal projects and competitive programming achievements. Their focus on delivering high-performance, optimized solutions aligns with a results-oriented culture. The experience in training and collaboration suggests a team-player mindset, making them a good cultural fit for an innovative and technically demanding environment.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving aptitude and a proactive approach to technical challenges. Experience in collaborating with system architects and ML engineers, along with authoring performance runbooks and training other engineers, indicates good teamwork and mentorship potential. The psychometric test score (73%) suggests a reasonable work attitude and ability to handle stress, contributing positively to operational fit.