AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit

ML Infrastructure Engineer with 3+ years in GPU/CPU Performance & LLM Serving
GPU systems and ML inference infrastructure engineer. I make models run faster and cheaper on real hardware CUDA kernels, GPU/CPU performance, NUMA-aware multi-node systems, quantization, and LLM serving runtimes. Production contributions to llama.cpp and OpenBLAS. First-authored IEEE HIPC M.Tech (Systems & Compilers), IIT Bombay
IIT Bombay
M.Tech · Computer Science & Engineering
August 1, 2021 – June 30, 2023
Fujitsu Research
Software Development Engineer
June 1, 2023 – Present
India
Low-Level Static Analysis Engine for C++
January 1, 2021 – January 1, 2023
Built an LLVM-IR static analysis engine for C++ custom alias-analysis passes feeding SAT/SMT constraints into a bounded model checker, with full exception-handling (invoke/landingpad/resume) encoding; cut solve time 13% and verified 6/10 cases where CBMC crashed on all.
View ProjectCassandra-Inspired Distributed KV Store
January 1, 2021 – January 1, 2023
Built a leaderless 6-node KV store CHORD ring with finger tables for O(log n) gRPC routing, gossip protocol for decentralised membership and failure detection; LSM-tree write path with locked memtable, async SSTable flush, and background compaction; node addition/removal with automatic key rebalancing and cache with fine-grained locking.
PageRank Acceleration
January 1, 2021 – January 1, 2023
Maximizing Multi-Core Efficiency in BLAS: A Scalable Architecture for Performance.
IEEE HIPC 2024 / arXiv
January 1, 2024 – Present
oneDAL Optimization for ARM SVE
IEEE HIPC 2024
January 1, 2024 – Present
Implemented GPU-accelerated PageRank (power iteration) in CUDA - CSR graph representation for coalesced warp memory access, shared memory reduction for convergence checks, and pointer-swap double buffering to eliminate data-race conditions; achieved ~50× speedup over CPU baseline on 1M-node graphs via full SM occupancy on T4.
Cultural Fit Analysis
The candidate's profile shows a strong fit for a high-performance, research-oriented engineering culture. Their academic projects and professional experience involve tackling challenging, low-level optimization problems, often pushing the boundaries of performance. Contributions to open-source projects like llama.cpp and OpenBLAS, along with publications, indicate a proactive and collaborative approach to problem-solving and a desire to contribute to the broader technical community. The diversity of projects, from static analysis to distributed KV stores and GPU acceleration, demonstrates a broad technical curiosity and adaptability.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills, evidenced by their ability to diagnose and resolve complex performance issues in large-scale systems. Their contributions to open-source projects and publications suggest a collaborative and knowledge-sharing mindset. The detailed descriptions of their work indicate a methodical approach to engineering and a focus on measurable impact. Their recognition as 'Employee of the Quarter' and 'Fujitsu Grand Award' further highlight their operational excellence and impact.