Shashank Mishra

Big Data Engineer

Key Strengths

Extensive experience in Big Data technologies including Hadoop Ecosystem (HDFS, YARN), Spark, PySpark, Hive, and Kafka.
Strong background in cloud platforms, particularly AWS (S3, Lambda, Glue, EMR, Redshift, SNS, DynamoDB) and Azure (Databricks).
Proven ability to design and implement scalable data ingestion and processing pipelines, including real-time streaming with Flink and Kafka.
Experience with data warehousing concepts and tools like Redshift and Oracle.
Demonstrated ability to optimize existing pipelines, reduce execution time, and improve alerting systems.
Experience with various programming languages relevant to Big Data: Python, Java, Scala, Shell Scripting.
Track record of taking ownership of projects and working closely with business teams to understand requirements and deliver solutions.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's diverse project portfolio across multiple companies (Expedia, Amazon, QuantumBlack, Paytm, OperaSolutions) demonstrates adaptability and exposure to different organizational cultures and problem domains. The roles consistently align with Big Data Engineering, indicating a clear career path and passion for the domain. The breadth of technologies used across projects suggests a willingness to learn and adapt to new tools and frameworks, which is a positive indicator for cultural fit in dynamic environments.

Soft Skills & Operational Fit

The candidate's project descriptions highlight ownership, collaboration with business teams, and a focus on optimizing processes and saving manual effort, indicating strong operational fit and problem-solving soft skills. The mention of 'Leadership principals like Customer Obsession, Earn Trust and Think Big' at Amazon suggests alignment with result-oriented and customer-focused work environments.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

Experience

Prophecy

Staff Data Engineer

September 1, 2025 – Present

Bengaluru, Karnataka, India · Hybrid

Prophecy

Data Engineer

March 1, 2024 – September 1, 2025

Bengaluru, Karnataka, India · Hybrid

Expedia Group

Data Engineer - III

November 1, 2021 – February 1, 2024

Gurugram, Haryana, India

Amazon

Data Engineer

March 1, 2020 – November 1, 2021

Bengaluru, Karnataka, India

QuantumBlack, AI by McKinsey

Data Engineer

December 1, 2019 – March 1, 2020

Gurgaon, Haryana, India

Paytm

Software Engineer ( BigData & DWH )

January 1, 2019 – December 1, 2019

Noida Area, India

OperaSolutions

Software Engineer - II ( BigData & Analytics )

December 1, 2018 – January 1, 2019

OperaSolutions

Software Engineer-1 ( BigData & Analytics )

July 1, 2017 – November 1, 2018

OperaSolutions

Software Intern

January 1, 2017 – June 1, 2017

Projects

Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS

February 1, 2021 – November 1, 2021

-> Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS - Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion - It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee - This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake

Incremental Ingestion pipeline – Employee Benefits Data

May 1, 2020 – February 1, 2021

-> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark - Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data - Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc - Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards

Automated Alerting System for Job Monitoring

March 1, 2020 – May 1, 2020

-> Tech Stack – Python, AWS CLI, QuickSight - Created automated alerting system for Redshift load metrics and Job monitoring - It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status

Feature Development For Telecom Data

December 1, 2019 – March 1, 2020

-> Tech Stack – PySpark, Kedro, Azure Cloud, Databricks - Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework - Worked closely with client in order to get business requirements - Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis

GG VMN migration

September 1, 2019 – November 1, 2019

Tech Stack – PySpark, Hive, Azkaban, Jenkins - Migrated all Facts/Olaps written in Hive into PySpark - Created job flows in Azkaban

Data Ingestion & Sync Process

June 1, 2019 – August 1, 2019

Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins - Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO - Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running - Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space - Added exception handling scenarios in data sync logic to fix multiple bugs - Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure

Near Real Time Data Pipeline - POC

April 1, 2019 – June 1, 2019

Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven - Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues - Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data - Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication

Dehleez - Report Scheduling Tool

February 1, 2019 – April 1, 2019

Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog - Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats - Diff Checker - Admins can check the difference between queries before approving reports - Time slot picker to schedule a report - User can see scheduled reports for next 4 hours from intended schedule time and can pick the slot accordingly - Dump report output into S3 bucket - User can take dump of report output into AWS S3 bucket - Cassandra Connector - User can schedule reports having Cassandra query panels in addition with HIVE/MySQL

Hive Query Parser

February 1, 2019 – March 1, 2019

Tech Stack – Django, Django RestFramework, Python, NGINX - Query Validator and Optimization Engine - Created a Django web application to parse and validate user's hive queries. In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query - PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.

Procurement Spend Optimizer

July 1, 2018 – January 1, 2019

o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using smart categorisation, benchmarking and anomaly detection o Crafted a Big Data based solution; organised structured & unstructured data o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month Technologies Used : Hadoop Framework, Spark Languages Used : Java, Python 2.7 Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM ) Version Control : SVN

Trip Narrative

January 1, 2017 – June 1, 2018

o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's engagement throughout the life-cycle of the trip o Developed data pipelines from scratch; optimised data aggregation from 10+ independent sources and automated the ETL process to roll out the solution o The solution powers a web application; used by 1000+ CSRs and decision makers o Built application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications (Distributed Processing Engine), SQL and Python Technologies Used : Hadoop Framework, REST, Ingres DB, NGINX Languages Used : Java, Python 2.7, YAML, SQL Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM ) Version Control : SVN

BlueChat

February 1, 2016 – March 1, 2016

BlueChat is an android chat application which leverages the Bluetooth stack to send text, images and contacts. Text messages are also includes on-the-fly encryption and decryption for text. Technologies Used: Android Language Used: JAVA, XML Tools Used: Android Studio 2.0

Key Strengths

Extensive experience in Big Data technologies including Hadoop Ecosystem (HDFS, YARN), Spark, PySpark, Hive, and Kafka.
Strong background in cloud platforms, particularly AWS (S3, Lambda, Glue, EMR, Redshift, SNS, DynamoDB) and Azure (Databricks).
Proven ability to design and implement scalable data ingestion and processing pipelines, including real-time streaming with Flink and Kafka.
Experience with data warehousing concepts and tools like Redshift and Oracle.
Demonstrated ability to optimize existing pipelines, reduce execution time, and improve alerting systems.
Experience with various programming languages relevant to Big Data: Python, Java, Scala, Shell Scripting.
Track record of taking ownership of projects and working closely with business teams to understand requirements and deliver solutions.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Shashank Mishra

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Key Strengths

Cultural & Operational Fit