VIVEK JAIN

Data Engineer

https://www.opentalent.in/vivek-jain-3138348

Data Engineer with 5+ years in PySpark, Azure Databricks & Delta Lake

Energy Exemplar

Key Strengths

Extensive experience in building and optimizing large-scale ETL/ELT pipelines using PySpark, Azure Databricks, and Delta Lake.
Strong proficiency in data orchestration tools like Apache Airflow and Azure Data Factory, with a proven track record of high pipeline uptime.
Demonstrated ability to improve data processing efficiency (e.g., reducing batch times, optimizing Spark jobs) and ensure data quality (validation, ACID guarantees).
Solid understanding of data modeling (star/snowflake schema) and experience with various databases (SQL Server, MongoDB, MySQL, PostgreSQL).
Experience with both batch and real-time data processing, including Spark Streaming.
Mentorship experience with junior engineers, indicating leadership potential.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's project diversity, spanning enterprise ETL platforms, real-time processing, and analytics pipelines, indicates adaptability and a broad skill set relevant to various data engineering challenges. Their experience across different companies (Energy, Financial, Staffing) suggests an ability to integrate into diverse organizational cultures. The explicit mention of mentoring junior engineers aligns with a collaborative and growth-oriented environment. The certifications in Data Science and SQL further demonstrate a commitment to continuous learning and skill development, which is a positive cultural indicator.

Soft Skills & Operational Fit

The candidate's resume highlights strong problem-solving skills through performance optimization and data quality improvements. Their experience in mentoring junior engineers suggests good collaboration and leadership potential. The focus on delivering 'genuinely useful' data indicates a user-centric and results-oriented approach. The consistent achievement of high pipeline success rates and uptime demonstrates reliability and attention to operational excellence.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Data Engineer with 5+ years of experience building scalable data pipelines and distributed processing systems that organisations depend on every day. I specialise in PySpark, Azure Databricks, Delta Lake, and Apache Airflow — and I take pride in making large-scale data processing fast, reliable, and cost-effective. My background spans the full data engineering lifecycle: ETL/ELT design, Spark optimisation, cloud deployment, and production monitoring. I have delivered Azure solutions processing 10M+ records/hour at 99%+ uptime, always focused on producing data that is genuinely useful to the analysts and business stakeholders who depend on it.

Top Skills

Apache SparkData LakePython

Experience

Energy Exemplar

Software Engineer

August 1, 2025 – March 1, 2026

India

Capgemini

Senior Software Engineer

April 1, 2022 – August 1, 2025

India

Rapid Staffing & Training Solutions

Python Developer

November 1, 2020 – April 1, 2022

India

Projects

Enterprise Databricks ETL Platform

June 1, 2026 – Present

• Architected an end-to-end ETL platform on Databricks ingesting from 6 source systems into a Delta Lake lakehouse cutting nightly batch time from 4 hours to 45 minutes (81% improvement) via incremental loads and SCD Type 2. • Configured Airflow DAGs with dynamic task generation and Slack alerting, achieving 99.5% pipeline success over 12 months in production.

Real-Time and Batch Data Processing System

June 1, 2026 – Present

• Built Spark Streaming pipelines reducing data latency from 8 hours to under 15 minutes, with Delta Lake merge operations ensuring exactly-once semantics and a mean time to detection under 5 minutes for any failure.

Data Analytics Pipeline (Python + Power BI)

June 1, 2026 – Present

• Automated full source-to-dashboard pipeline using Pandas and SQLAlchemy across 4 source systems - eliminating 100% of manual prep and cutting dashboard refresh lag from 24 hours to under 1 hour.

Certifications

Data Visualization with Python

IBM/Coursera

June 1, 2026 – Present

SQL and Relational Databases

IBM

June 1, 2026 – Present

Data Engineering Fundamentals

Coursera

June 1, 2026 – Present

AZ-900

In Progress

June 1, 2026 – Present

Data Science with Python

IBM/Coursera

June 1, 2026 – Present

Key Strengths

Extensive experience in building and optimizing large-scale ETL/ELT pipelines using PySpark, Azure Databricks, and Delta Lake.
Strong proficiency in data orchestration tools like Apache Airflow and Azure Data Factory, with a proven track record of high pipeline uptime.
Demonstrated ability to improve data processing efficiency (e.g., reducing batch times, optimizing Spark jobs) and ensure data quality (validation, ACID guarantees).
Solid understanding of data modeling (star/snowflake schema) and experience with various databases (SQL Server, MongoDB, MySQL, PostgreSQL).
Experience with both batch and real-time data processing, including Spark Streaming.
Mentorship experience with junior engineers, indicating leadership potential.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

VIVEK JAIN

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit