Data Engineer with 5+ years in PySpark, Azure Databricks & Delta Lake
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Data Engineer with 5+ years of experience building scalable data pipelines and distributed processing systems that organisations depend on every day. I specialise in PySpark, Azure Databricks, Delta Lake, and Apache Airflow — and I take pride in making large-scale data processing fast, reliable, and cost-effective. My background spans the full data engineering lifecycle: ETL/ELT design, Spark optimisation, cloud deployment, and production monitoring. I have delivered Azure solutions processing 10M+ records/hour at 99%+ uptime, always focused on producing data that is genuinely useful to the analysts and business stakeholders who depend on it.
PDM University
MCA · Computer Science
January 1, 2018 – January 1, 2020
Energy Exemplar
Software Engineer
August 1, 2025 – March 1, 2026
India
Capgemini
Senior Software Engineer
April 1, 2022 – August 1, 2025
India
Rapid Staffing & Training Solutions
Python Developer
November 1, 2020 – April 1, 2022
India
Enterprise Databricks ETL Platform
June 1, 2026 – Present
• Architected an end-to-end ETL platform on Databricks ingesting from 6 source systems into a Delta Lake lakehouse cutting nightly batch time from 4 hours to 45 minutes (81% improvement) via incremental loads and SCD Type 2. • Configured Airflow DAGs with dynamic task generation and Slack alerting, achieving 99.5% pipeline success over 12 months in production.
Real-Time and Batch Data Processing System
June 1, 2026 – Present
• Built Spark Streaming pipelines reducing data latency from 8 hours to under 15 minutes, with Delta Lake merge operations ensuring exactly-once semantics and a mean time to detection under 5 minutes for any failure.
Data Analytics Pipeline (Python + Power BI)
June 1, 2026 – Present
• Automated full source-to-dashboard pipeline using Pandas and SQLAlchemy across 4 source systems - eliminating 100% of manual prep and cutting dashboard refresh lag from 24 hours to under 1 hour.
Data Visualization with Python
IBM/Coursera
June 1, 2026 – Present
SQL and Relational Databases
IBM
June 1, 2026 – Present
Data Engineering Fundamentals
Coursera
June 1, 2026 – Present
AZ-900
In Progress
June 1, 2026 – Present
Data Science with Python
IBM/Coursera
June 1, 2026 – Present
Cultural Fit Analysis
The candidate's project diversity, spanning enterprise ETL platforms, real-time processing, and analytics pipelines, indicates adaptability and a broad skill set relevant to various data engineering challenges. Their experience across different companies (Energy, Financial, Staffing) suggests an ability to integrate into diverse organizational cultures. The explicit mention of mentoring junior engineers aligns with a collaborative and growth-oriented environment. The certifications in Data Science and SQL further demonstrate a commitment to continuous learning and skill development, which is a positive cultural indicator.
Soft Skills & Operational Fit
The candidate's resume highlights strong problem-solving skills through performance optimization and data quality improvements. Their experience in mentoring junior engineers suggests good collaboration and leadership potential. The focus on delivering 'genuinely useful' data indicates a user-centric and results-oriented approach. The consistent achievement of high pipeline success rates and uptime demonstrates reliability and attention to operational excellence.