Nokul Debanath

Data Engineer

https://www.opentalent.in/nokul-debanath

Data Engineer with less than a year in ETL pipelines and lakehouse architectures with 10 months of p

Pretoria, Gauteng, South Africa

Key Strengths

Strong foundational understanding of data lakehouse architectures, particularly with Apache Iceberg.
Demonstrated ability to design and implement end-to-end ETL/ELT pipelines for various data types (time-series, economic news).
Experience with distributed processing frameworks like Apache Spark for high-volume data.
Proficiency in Python for data manipulation, scripting, and pipeline development.
Exposure to ML integration, including feature engineering and model tracking with MLflow.
Understanding of data partitioning strategies and schema evolution for optimized storage and querying.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's projects demonstrate a strong interest in modern data engineering concepts, particularly around data lakehouses and real-time data processing. The personal nature of all projects suggests self-motivation and a drive to learn and build. The breadth of skills across data ingestion, processing, storage, and ML integration aligns well with a dynamic data engineering environment. However, the lack of team-based project experience or professional roles makes it challenging to fully assess cultural fit in a collaborative work setting.

Soft Skills & Operational Fit

The candidate's project descriptions indicate a proactive and experimental approach to learning new technologies (e.g., Pylceberg for local lakehouse). The focus on automation, reliability, and auditability in pipelines suggests an operational mindset. However, without direct work experience, it's difficult to assess collaboration, stress handling, or communication in a team setting.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Data Engineer specializing in Python, SQL, Apache Spark, and Docker, with hands-on experience building scalable ETL pipelines, distributed data processing systems, and modern lakehouse architectures. Skilled in developing high-throughput data platforms, optimizing data storage and partitioning strategies, and deploying containerized workflows for analytics and machine learning applications.

Top Skills

PythonSQLApache SparkDockerETL/ELT pipelines

Projects

Apache Iceberg Pylceberg Local Data Lakehouse

February 1, 2026 – February 1, 2026

Developed a local data lakehouse using Apache Iceberg (Pylceberg) for schema-aware, ACID-style table storage. Implemented Parquet time-series ingestion with deduplication, quality checks, and audit logging. Integrated a scheduler and folder-watcher automation for regular pipeline runs without cloud dependencies, all within a lightweight, Python-native environment for experimenting with modern lakehouse concepts.

View Project

Big Data Project Jan 2026 (Tick Data)

December 1, 2025 – February 1, 2026

Architected and deployed a scalable data lakehouse using Apache Iceberg, object storage, and a JDBC-backed catalog for large-scale, versioned tick data with full ACID guarantees. Built a robust end-to-end ETL pipeline to ingest, clean, transform, and partition high-frequency market tick data into analytics- and ML-ready datasets. Developed a partitioned ETL framework for high-volume financial time-series data, enabling validation, transformation, and optimization into ML-ready feature tables at multi-million record scale. Designed a production-style data workflow integrating NiFi ingestion, Spark parallel processing, and ML feature engineering to support model training and real-time signal generation.

View Project

Economic News Data Pipeline

June 1, 2025 – November 1, 2025

Constructed a full end-to-end ETL pipeline using MQL5 monthly economic release data, automating extraction, transformation, and loading of economic events. Formulated predictive models using XGBoost and Recurrent Neural Networks to forecast the impact of upcoming economic releases for the month, with model parameters and performance tracked in MLflow. The models were retrained at the end of each month to incorporate the latest data. Integrated the pipeline with Airflow for orchestration, and MySQL for structured storage, ensuring a scalable, reliable, and auditable workflows.

View Project

Key Strengths

Strong foundational understanding of data lakehouse architectures, particularly with Apache Iceberg.
Demonstrated ability to design and implement end-to-end ETL/ELT pipelines for various data types (time-series, economic news).
Experience with distributed processing frameworks like Apache Spark for high-volume data.
Proficiency in Python for data manipulation, scripting, and pipeline development.
Exposure to ML integration, including feature engineering and model tracking with MLflow.
Understanding of data partitioning strategies and schema evolution for optimized storage and querying.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Nokul Debanath

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Projects

Certifications

Key Strengths

Cultural & Operational Fit