
Data Engineer with 8+ years in real-time, planet-scale data platforms, ML/AI systems, and streaming
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Senior Data Engineer with 9+ years building realtime, planetscale data platforms at NVIDIA, Meta, Datadog, and Google. Expert in streaming architectures, big data processing, and multicloud infrastructure, with a strong focus on ML/Al systems through feature stores, GPUaccelerated pipelines, and retrievalaugmented generation. Combines deep engineering with a product mindset to modernize ecosystems, ensure reliability, and drive data platform adoption across organizations.
University College London
MSc · Data Science / Data Science and Machine Learning
August 1, 2016 – June 30, 2017
University College London
BSc · Computer Science
August 1, 2013 – June 30, 2016
Nvidia
Senior Data Engineer
October 1, 2024 – Present
India
Datadog
Senior Data Engineer
August 1, 2022 – September 1, 2024
India
Meta
Data Engineer
November 1, 2018 – July 1, 2022
India
Software Engineer Intern
October 1, 2017 – October 1, 2018
India
RealTime Cryptocurrency Market Data Pipeline
June 1, 2026 – Present
Architected a Kafkabased ingestion layer consuming WebSocket feeds from Binance, Coinbase, and Kraken, normalizing heterogeneous schemas into a unified Protobufserialized format. Built a Flink streaming job that windows raw trades into 1second OHLCV candles, enriches them with orderbook imbalance metrics, and sinks to Apache Iceberg on AWS S3. Deployed a Trino query engine and Grafana dashboard for live market monitoring; orchestrated daily dbt transformations to materialize summary tables. Achieved subsecond endtoend latency at 500M+ events/day, enabling a quantitative researcher to backtest a momentum strategy that outperformed the market by 12% in simulation.
RAGPowered Research Paper Knowledge Base
June 1, 2026 – Present
Scraped and parsed 100,000+ arXiv PDFs, extracting text, tables, and figure captions using a custom PyMuPDF + Camelot pipeline, then chunked documents with LangChain. Generated embeddings via sentence-transformers/all-mpnet-base-v2 and stored them in a Qdrant vector database, with hybrid keywordsemantic retrieval tuned for technical queries. Built a FastAPI backend that retrieves topk chunks and feeds them as context to a Llama 3 (8B) model running on a cloud GPU, returning answers with linked sources. Containerized the entire stack (Docker Compose) and deployed on an AWS EC2 instance; served as a personal research assistant that cut literature review time by 70%.
OpenSource Data Observability Tool for Airflow
June 1, 2026 – Present
Developed a Python sensor framework that injects sidecar containers alongside Airflow tasks, collecting record counts, freshness, and schema checks and streaming them to Kafka. Parsed Airflow DAG definitions and Spark execution plans to build an automated lineage graph in Neo4j, visualizable through a lightweight Streamlit UI. Implemented SLObased alerting: pipelines missing freshness targets triggered Prometheus alerts and Slack notifications; integrated with PagerDuty for oncall rotations. Opensourced the project on GitHub, where it gained 450+ stars and was adopted by 2 local data teams; reduced their meantimetodetect data issues from hours to under 10 minutes.
Cultural Fit Analysis
The candidate's experience across multiple top-tier tech companies (NVIDIA, Datadog, Meta, Google) indicates adaptability and a strong cultural fit for fast-paced, innovative environments. Their involvement in open-source projects and contributions to data observability tools suggest a collaborative and community-oriented mindset. The diversity of projects, from real-time cryptocurrency pipelines to RAG-powered knowledge bases and data observability tools, showcases intellectual curiosity and a willingness to tackle varied challenges. The candidate's focus on driving business value and improving efficiency aligns well with performance-driven cultures.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills through complex system designs and optimizations. Their experience in open-sourcing a project and driving adoption indicates leadership and collaboration. The focus on reliability, uptime, and cost reduction highlights a strong operational mindset. Their ability to work across diverse teams and integrate various systems suggests excellent communication and stakeholder management capabilities. The candidate's product mindset is evident in their ability to translate technical solutions into tangible business value.