Key Strengths

Deep expertise in GPU programming (CUDA) and performance optimization for ML workloads, evidenced by significant speedups (3.14x, 50x) and contributions to llama.cpp and OpenBLAS.
Strong understanding of low-level system architecture, including NUMA, memory hierarchies (SRAM, HBM), and SIMD (SVE, NEON) for high-performance computing.
Proven ability to diagnose and resolve complex performance bottlenecks in multi-threaded and distributed systems.
Experience with quantization techniques (Q4_0, Q8_0, Q2_K, Q3_K, Q4_HQQ) directly relevant to ML inference efficiency.
Academic background in Systems Software & Compilers (IIT Bombay) and practical experience align perfectly with an ML Infrastructure Engineer role.
Demonstrated ability to contribute to open-source projects (llama.cpp, OpenBLAS) and publish research (IEEE HIPC).

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's profile shows a strong fit for a high-performance, research-oriented engineering culture. Their academic projects and professional experience involve tackling challenging, low-level optimization problems, often pushing the boundaries of performance. Contributions to open-source projects like llama.cpp and OpenBLAS, along with publications, indicate a proactive and collaborative approach to problem-solving and a desire to contribute to the broader technical community. The diversity of projects, from static analysis to distributed KV stores and GPU acceleration, demonstrates a broad technical curiosity and adaptability.

Soft Skills & Operational Fit

The candidate demonstrates strong problem-solving skills, evidenced by their ability to diagnose and resolve complex performance issues in large-scale systems. Their contributions to open-source projects and publications suggest a collaborative and knowledge-sharing mindset. The detailed descriptions of their work indicate a methodical approach to engineering and a focus on measurable impact. Their recognition as 'Employee of the Quarter' and 'Fujitsu Grand Award' further highlight their operational excellence and impact.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Shivam Gautam

ML Infrastructure Engineer

https://www.opentalent.in/shivam-gautam-7165635

ML Infrastructure Engineer with 3+ years in GPU/CPU Performance & LLM Serving

Fujitsu Research

Member since June 24, 2026

https://www.opentalent.in/shivam-gautam-7165635

About

GPU systems and ML inference infrastructure engineer. I make models run faster and cheaper on real hardware CUDA kernels, GPU/CPU performance, NUMA-aware multi-node systems, quantization, and LLM serving runtimes. Production contributions to llama.cpp and OpenBLAS. First-authored IEEE HIPC M.Tech (Systems & Compilers), IIT Bombay

Top Skills

CudahuggingfaceLlvmTensorFlowDocker

Skills

Cudallama.cppVllmPyTorchTensorFlowhuggingfaceollamaGGMLMultithreadingOpenMPpthreadsvalgrindLinuxGrpcDockerLlvmJava

Education

IIT Bombay

M.Tech · Computer Science & Engineering

August 1, 2021 – June 30, 2023

Experience

Fujitsu Research

Software Development Engineer

June 1, 2023 – Present

India

Projects

Low-Level Static Analysis Engine for C++

January 1, 2021 – January 1, 2023

Built an LLVM-IR static analysis engine for C++ custom alias-analysis passes feeding SAT/SMT constraints into a bounded model checker, with full exception-handling (invoke/landingpad/resume) encoding; cut solve time 13% and verified 6/10 cases where CBMC crashed on all.

View Project

Cassandra-Inspired Distributed KV Store

January 1, 2021 – January 1, 2023

Built a leaderless 6-node KV store CHORD ring with finger tables for O(log n) gRPC routing, gossip protocol for decentralised membership and failure detection; LSM-tree write path with locked memtable, async SSTable flush, and background compaction; node addition/removal with automatic key rebalancing and cache with fine-grained locking.

PageRank Acceleration

January 1, 2021 – January 1, 2023

Certifications

Maximizing Multi-Core Efficiency in BLAS: A Scalable Architecture for Performance.

IEEE HIPC 2024 / arXiv

January 1, 2024 – Present

oneDAL Optimization for ARM SVE

IEEE HIPC 2024

January 1, 2024 – Present

Key Strengths

Deep expertise in GPU programming (CUDA) and performance optimization for ML workloads, evidenced by significant speedups (3.14x, 50x) and contributions to llama.cpp and OpenBLAS.
Strong understanding of low-level system architecture, including NUMA, memory hierarchies (SRAM, HBM), and SIMD (SVE, NEON) for high-performance computing.
Proven ability to diagnose and resolve complex performance bottlenecks in multi-threaded and distributed systems.
Experience with quantization techniques (Q4_0, Q8_0, Q2_K, Q3_K, Q4_HQQ) directly relevant to ML inference efficiency.
Academic background in Systems Software & Compilers (IIT Bombay) and practical experience align perfectly with an ML Infrastructure Engineer role.
Demonstrated ability to contribute to open-source projects (llama.cpp, OpenBLAS) and publish research (IEEE HIPC).

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit