
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Staff Data Engineer @ Prophecy | Building GrowDataSkills | YouTuber (184k+) | Data Engineering Educator | Public Speaker | Ex-Expedia, Amazon, McKinsey, PayTm
Experienced Data Engineer with a demonstrated history of solving complex data problems across various domains like Aviation, Pharmaceutical, FinTech, Telecom, and Employee Services. I've designed and built scalable data pipelines to handle vast amounts of data, both in batch and real-time environments. With a passion for taking ownership, I thrive on collaborating with business teams and stakeholders to drive impactful solutions. Beyond my professional journey, I am dedicated to empowering the next generation of data professionals through GrowDataSkills, a platform I co-founded to provide quality, hands-on learning at the most affordable rates. Since 2020, I've been actively contributing to the data engineering community by creating insightful content on my YouTube channel, E-Learning Bridge, where I share my experiences and knowledge through podcasts and practical lessons. LinkedIn has been instrumental in my growth, and I continue to use it as a platform to share my daily thoughts and ideas ❤️
Motilal Nehru National Institute Of Technology
Master of Computer Applications (M.C.A.), Computer Science and Engineering
January 1, 2014 – January 1, 2017
University of Lucknow
B.SC in Computer Science
January 1, 2011 – January 1, 2014
Prophecy
Staff Data Engineer
September 1, 2025 – Present
Bengaluru, Karnataka, India · Hybrid
Prophecy
Data Engineer
March 1, 2024 – September 1, 2025
Bengaluru, Karnataka, India · Hybrid
Expedia Group
Data Engineer - III
November 1, 2021 – February 1, 2024
Gurugram, Haryana, India
Amazon
Data Engineer
March 1, 2020 – November 1, 2021
Bengaluru, Karnataka, India
QuantumBlack, AI by McKinsey
Data Engineer
December 1, 2019 – March 1, 2020
Gurgaon, Haryana, India
Paytm
Software Engineer ( BigData & DWH )
January 1, 2019 – December 1, 2019
Noida Area, India
OperaSolutions
Software Engineer - II ( BigData & Analytics )
December 1, 2018 – January 1, 2019
OperaSolutions
Software Engineer-1 ( BigData & Analytics )
July 1, 2017 – November 1, 2018
OperaSolutions
Software Intern
January 1, 2017 – June 1, 2017
Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS
February 1, 2021 – November 1, 2021
-> Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS - Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion - It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee - This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake
Incremental Ingestion pipeline – Employee Benefits Data
May 1, 2020 – February 1, 2021
-> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark - Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data - Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc - Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards
Automated Alerting System for Job Monitoring
March 1, 2020 – May 1, 2020
-> Tech Stack – Python, AWS CLI, QuickSight - Created automated alerting system for Redshift load metrics and Job monitoring - It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status
Feature Development For Telecom Data
December 1, 2019 – March 1, 2020
-> Tech Stack – PySpark, Kedro, Azure Cloud, Databricks - Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework - Worked closely with client in order to get business requirements - Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis
GG VMN migration
September 1, 2019 – November 1, 2019
Tech Stack – PySpark, Hive, Azkaban, Jenkins - Migrated all Facts/Olaps written in Hive into PySpark - Created job flows in Azkaban
Data Ingestion & Sync Process
June 1, 2019 – August 1, 2019
Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins - Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO - Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running - Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space - Added exception handling scenarios in data sync logic to fix multiple bugs - Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure
Near Real Time Data Pipeline - POC
April 1, 2019 – June 1, 2019
Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven - Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues - Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data - Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication
Dehleez - Report Scheduling Tool
February 1, 2019 – April 1, 2019
Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog - Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats - Diff Checker - Admins can check the difference between queries before approving reports - Time slot picker to schedule a report - User can see scheduled reports for next 4 hours from intended schedule time and can pick the slot accordingly - Dump report output into S3 bucket - User can take dump of report output into AWS S3 bucket - Cassandra Connector - User can schedule reports having Cassandra query panels in addition with HIVE/MySQL
Hive Query Parser
February 1, 2019 – March 1, 2019
Tech Stack – Django, Django RestFramework, Python, NGINX - Query Validator and Optimization Engine - Created a Django web application to parse and validate user's hive queries. In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query - PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.
Procurement Spend Optimizer
July 1, 2018 – January 1, 2019
o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using smart categorisation, benchmarking and anomaly detection o Crafted a Big Data based solution; organised structured & unstructured data o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month Technologies Used : Hadoop Framework, Spark Languages Used : Java, Python 2.7 Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM ) Version Control : SVN
Trip Narrative
January 1, 2017 – June 1, 2018
o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's engagement throughout the life-cycle of the trip o Developed data pipelines from scratch; optimised data aggregation from 10+ independent sources and automated the ETL process to roll out the solution o The solution powers a web application; used by 1000+ CSRs and decision makers o Built application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications (Distributed Processing Engine), SQL and Python Technologies Used : Hadoop Framework, REST, Ingres DB, NGINX Languages Used : Java, Python 2.7, YAML, SQL Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM ) Version Control : SVN
BlueChat
February 1, 2016 – March 1, 2016
BlueChat is an android chat application which leverages the Bluetooth stack to send text, images and contacts. Text messages are also includes on-the-fly encryption and decryption for text. Technologies Used: Android Language Used: JAVA, XML Tools Used: Android Studio 2.0
Cultural Fit Analysis
The candidate's diverse project portfolio across multiple companies (Expedia, Amazon, QuantumBlack, Paytm, OperaSolutions) demonstrates adaptability and exposure to different organizational cultures and problem domains. The roles consistently align with Big Data Engineering, indicating a clear career path and passion for the domain. The breadth of technologies used across projects suggests a willingness to learn and adapt to new tools and frameworks, which is a positive indicator for cultural fit in dynamic environments.
Soft Skills & Operational Fit
The candidate's project descriptions highlight ownership, collaboration with business teams, and a focus on optimizing processes and saving manual effort, indicating strong operational fit and problem-solving soft skills. The mention of 'Leadership principals like Customer Obsession, Earn Trust and Think Big' at Amazon suggests alignment with result-oriented and customer-focused work environments.