Software Development Engineer with 3+ years in GPU & ML Inference Infrastructure
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
GPU systems and ML inference infrastructure engineer. I make models run faster and cheaper on real hardware CUDA kernels, GPU/CPU performance, NUMA-aware multi-node systems, quantization, and LLM serving runtimes. Production contributions to llama.cpp and OpenBLAS. First-authored IEEE HIPC M.Tech (Systems & Compilers), IIT Bombay
IIT Bombay
M.Tech · Computer Science & Engineering
August 1, 2021 – June 30, 2023
Fujitsu Research
Software Development Engineer
June 1, 2023 – Present
India
Low-Level Static Analysis Engine for C++
January 1, 2021 – January 1, 2023
Built an LLVM-IR static analysis engine for C++ — custom alias-analysis passes feeding SAT/SMT constraints into a bounded model checker, with full exception-handling (invoke/landingpad/resume) encoding; cut solve time 13% and verified 6/10 cases where CBMC crashed on all.
View ProjectCassandra-Inspired Distributed KV Store
January 1, 2021 – January 1, 2023
Built a leaderless 6-node KV store — CHORD ring with finger tables for O(log n) gRPC routing, gossip protocol for decentralised membership and failure detection; LSM-tree write path with locked memtable, async SSTable flush, and background compaction; node addition/removal with automatic key rebalancing and cache with fine-grained locking.
PageRank Acceleration
January 1, 2021 – January 1, 2023
Implemented GPU-accelerated PageRank (power iteration) in CUDA — CSR graph representation for coalesced warp memory access, shared memory reduction for convergence checks, and pointer-swap double buffering to eliminate data-race conditions; achieved ~50× speedup over CPU baseline on 1M-node graphs via full SM occupancy on T4.
Maximizing Multi-Core Efficiency in BLAS: A Scalable Architecture for Performance.
IEEE HIPC 2024 / arXiv
January 1, 2024 – Present
oneDAL Optimization for ARM SVE
IEEE HIPC 2024
January 1, 2024 – Present
Cultural Fit Analysis
The candidate's academic background from IIT Bombay and their current role at Fujitsu Research, combined with contributions to open-source projects (llama.cpp, OpenBLAS) and publications, suggest a strong alignment with a research-oriented, high-performance engineering culture. Their diverse project portfolio, ranging from static analysis to distributed systems and GPU acceleration, indicates adaptability and a broad technical curiosity. The 'Employee of the Quarter' award further highlights their commitment and impact within a professional setting.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills, evidenced by their ability to diagnose and resolve complex performance issues (e.g., throughput cliff in llama.cpp, OpenBLAS oversubscription). Their contributions to open-source projects and publications suggest a collaborative mindset and a drive for continuous learning and sharing knowledge. The detailed descriptions of their work indicate a methodical approach to engineering and a focus on measurable impact.