Sourabh Joshi

Key Strengths

Extensive experience (13 years) in data engineering, data warehousing, and cloud-native data platforms, which aligns well with a senior Data Analyst role requiring deep data understanding.
Demonstrated leadership in designing and evolving modern data engineering solutions, including lakehouse architectures (Iceberg, Hudi), real-time streaming, and GenAI data pipelines.
Strong background in data governance, data quality frameworks (Great Expectations, Soda, Deequ), and FinOps for data platforms, indicating a holistic approach to data management.
Proficiency in key technologies such as AWS, Apache Spark, Kafka, Flink, Airflow, dbt, and Python, which are critical for advanced data analysis and platform interaction.
Experience in building and optimizing ETL workflows, data integration, and reporting solutions, directly supporting the core functions of a Data Analyst.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's experience spans multiple large enterprises (Barclays, Tesco, ANZ, Accenture) and includes diverse projects from AI-ready lakehouses to FinOps and data reliability platforms. This breadth of experience and exposure to different problem domains suggests adaptability and a willingness to tackle varied challenges. The focus on establishing frameworks and driving best practices indicates a proactive and improvement-oriented mindset. However, the target role is 'Data Analyst' while the experience is heavily skewed towards 'Data Engineer' and 'Lead Data Engineer'. While the underlying data skills are relevant, the shift in focus from building/leading data platforms to primarily analyzing data might require adjustment. The projects demonstrate a strong engineering and architectural bent, which might be overqualified or misaligned if the Data Analyst role is purely consumption-focused.

Soft Skills & Operational Fit

The candidate's project descriptions highlight a strong focus on data reliability, cost optimization, and establishing frameworks (DataOps, data quality), suggesting an operational mindset and an ability to drive best practices. The experience in leading initiatives and collaborating across teams indicates strong leadership and teamwork skills. The pursuit of a Master's in Psychology suggests an interest in human behavior and potentially strong analytical and problem-solving skills, which are beneficial for understanding business requirements and user needs.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

𝗜 𝗕𝗨𝗜𝗟𝗗 𝗦𝗖𝗔𝗟𝗔𝗕𝗟𝗘 𝗗𝗔𝗧𝗔 𝗣𝗟𝗔𝗧𝗙𝗢𝗥𝗠𝗦 𝗧𝗛𝗔𝗧 𝗧𝗥𝗔𝗡𝗦𝗙𝗢𝗥𝗠 𝗖𝗢𝗠𝗣𝗟𝗘𝗫 𝗗𝗔𝗧𝗔 𝗜𝗡𝗧𝗢 𝗕𝗨𝗦𝗜𝗡𝗘𝗦𝗦 𝗜𝗡𝗦𝗜𝗚𝗛𝗧𝗦. As a Lead Data Engineer at Barclays with 13+ years of experience, I specialize in Data Engineering, Cloud Data Platforms, Data Architecture, Big Data, and Analytics. My career has evolved from ETL development to leading enterprise-scale data initiatives, helping organizations build reliable, scalable, and high-performing data ecosystems. 𝗘𝗫𝗣𝗘𝗥𝗧𝗜𝗦𝗘 I bring deep expertise in AWS, Apache Spark, Airflow, Kafka, Flink, Snowflake, Databricks, dbt, Python, Cassandra, Neo4j, Oracle, Delta Lake, Apache Hudi, and Apache Iceberg. My focus is on designing modern data platforms, optimizing data pipelines, strengthening data quality, and enabling data-driven decision-making at scale. 𝗞𝗘𝗬 𝗜𝗠𝗣𝗔𝗖𝗧 • Spearheaded enterprise data warehousing, analytics, and information delivery roadmaps. • Established a comprehensive DataOps framework for Data and Analytics teams. • Architected data quality, automated validation, and testing frameworks using BDD and TDD methodologies. • Built internal collaboration platforms and automation solutions that improved delivery efficiency. • Developed a sample data generator tool that reduced ETL testing time by 30%. • Delivered cloud and platform optimization initiatives focused on operational efficiency and cost reduction. 𝗕𝗘𝗬𝗢𝗡𝗗 𝗧𝗛𝗘 𝗥𝗢𝗟𝗘 I actively contribute to the open-source data community, write about modern data engineering practices, and explore emerging technologies across AI, Machine Learning, and Data Infrastructure. 𝗖𝗘𝗥𝗧𝗜𝗙𝗜𝗖

Top Skills

Data WarehousingData AnalyticsCloud Data EngineeringData ArchitectureEtlApache SparkbankingBashPythonSQLSasPysparkApache KafkapentahoDataStageData EngineeringBig Data Analytics

Experience

Barclays UK

Lead Data Engineer

November 1, 2024 – Present

Bengaluru, Karnataka, India · On-site

Tesco Bengaluru

Data Engineering Lead

September 1, 2021 – November 1, 2024

Tesco Bengaluru

Data Engineer

October 1, 2016 – August 1, 2021

ANZ

Technical Analyst

April 1, 2015 – September 1, 2016

Bengaluru Area, India

Accenture in India

Software Engineering Analyst

February 1, 2013 – April 1, 2015

Bangaon Area, India

Enverus

Intern

November 1, 2012 – January 1, 2013

Bangalore

Projects

AI-ready lakehouse with governed RAG

February 1, 2026 – Present

Designed and built an open-table-format lakehouse (Apache Iceberg on S3) that serves both BI and GenAI from one governed source of truth. Ingestion via Airbyte and Kafka feeds a medallion architecture; a downstream embedding pipeline chunks and vectorises curated Gold-layer data into a vector store for retrieval-augmented generation. Data contracts, Great Expectations validation, Spline lineage and column-level PII masking are enforced before any document reaches an embedding model — so the AI layer inherits trusted, governed data by default. Stack: Apache Iceberg, S3, Glue, Airbyte, Kafka, dbt, Airflow, Great Expectations, Spline, a vector store (pgvector / OpenSearch), embeddings + LLM.

Real-time streaming lakehouse for ML features and live analytics

January 1, 2026 – Present

Built a near-real-time streaming lakehouse that powers live dashboards and online ML features from the same pipeline. Application events and database CDC stream through Kafka into Apache Flink for windowed aggregations and enrichment, then land as upserts in Apache Hudi on S3. Curated outputs feed Apache Superset for sub-minute operational dashboards and a feature store for low-latency model serving — collapsing the usual gap between analytics and ML data. Stack: Kafka, Apache Flink, Apache Hudi, S3, Spark, Apache Superset, feature store, Python.

FinOps / cost-optimized data platform

April 1, 2025 – November 1, 2025

Led a cost-optimization initiative across cloud data pipelines, introducing usage observability, right-sized compute, partition/file compaction on the lakehouse, and query optimization on Athena/Glue. Established a FinOps feedback loop with per-pipeline cost attribution so teams could see and own their spend. Pair it with a hard number — your resume already implies measurable savings, so quantify it (e.g. "reduced pipeline operating cost by X%"). Stack: AWS (Glue, Athena, S3, Lambda, CloudFormation), Iceberg/Hudi compaction, cost dashboards, Python.

Data reliability platform — contracts, quality gates and observability

February 1, 2024 – October 1, 2025

Designed a data reliability framework that treats data quality as a CI/CD concern. Producer-consumer data contracts define schema and freshness SLAs; every Airflow/dbt run passes through automated quality gates (Great Expectations, Soda, Deequ) that block bad data from propagating. Spline lineage plus anomaly alerting give downstream teams full observability — cutting reactive pipeline firefighting and raising trust in the data feeding analytics and AI. Stack: Airflow, dbt, Great Expectations, Soda SQL, Deequ, Spline, data contracts, CI/CD.

Certifications

Microsoft Certified: Azure Fundamentals

Microsoft

June 25, 2026 – Present

IBM Certified Data Engineer - Big Data

IBM

June 25, 2026 – Present

Astronomer Certification for Apache Airflow Fundamentals

Astronomer

June 25, 2026 – Present

IBM Certified Solution Developer - InfoSphere DataStage v9.1

IBM

June 25, 2026 – Present

Aws Certified Data Engineer - Associate

Amazon Web Services (AWS)

June 25, 2026 – Present

Data Lake - Databricks

Databricks

June 25, 2026 – Present

Developer Associate

Amazon Web Services (AWS)

June 25, 2026 – Present

Machine Learning

Coursera

June 25, 2026 – Present

Key Strengths

Extensive experience (13 years) in data engineering, data warehousing, and cloud-native data platforms, which aligns well with a senior Data Analyst role requiring deep data understanding.
Demonstrated leadership in designing and evolving modern data engineering solutions, including lakehouse architectures (Iceberg, Hudi), real-time streaming, and GenAI data pipelines.
Strong background in data governance, data quality frameworks (Great Expectations, Soda, Deequ), and FinOps for data platforms, indicating a holistic approach to data management.
Proficiency in key technologies such as AWS, Apache Spark, Kafka, Flink, Airflow, dbt, and Python, which are critical for advanced data analysis and platform interaction.
Experience in building and optimizing ETL workflows, data integration, and reporting solutions, directly supporting the core functions of a Data Analyst.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit