Data Engineer with less than a year in ETL pipelines and lakehouse architectures with 10 months of p
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Data Engineer specializing in Python, SQL, Apache Spark, and Docker, with hands-on experience building scalable ETL pipelines, distributed data processing systems, and modern lakehouse architectures. Skilled in developing high-throughput data platforms, optimizing data storage and partitioning strategies, and deploying containerized workflows for analytics and machine learning applications.
University Of Pretoria
BENG Mechanical Engineering · Mechanical Engineering
January 1, 2022 – December 1, 2023
Apache Iceberg Pylceberg Local Data Lakehouse
February 1, 2026 – February 1, 2026
Developed a local data lakehouse using Apache Iceberg (Pylceberg) for schema-aware, ACID-style table storage. Implemented Parquet time-series ingestion with deduplication, quality checks, and audit logging. Integrated a scheduler and folder-watcher automation for regular pipeline runs without cloud dependencies, all within a lightweight, Python-native environment for experimenting with modern lakehouse concepts.
View ProjectBig Data Project Jan 2026 (Tick Data)
December 1, 2025 – February 1, 2026
Architected and deployed a scalable data lakehouse using Apache Iceberg, object storage, and a JDBC-backed catalog for large-scale, versioned tick data with full ACID guarantees. Built a robust end-to-end ETL pipeline to ingest, clean, transform, and partition high-frequency market tick data into analytics- and ML-ready datasets. Developed a partitioned ETL framework for high-volume financial time-series data, enabling validation, transformation, and optimization into ML-ready feature tables at multi-million record scale. Designed a production-style data workflow integrating NiFi ingestion, Spark parallel processing, and ML feature engineering to support model training and real-time signal generation.
View ProjectEconomic News Data Pipeline
June 1, 2025 – November 1, 2025
Constructed a full end-to-end ETL pipeline using MQL5 monthly economic release data, automating extraction, transformation, and loading of economic events. Formulated predictive models using XGBoost and Recurrent Neural Networks to forecast the impact of upcoming economic releases for the month, with model parameters and performance tracked in MLflow. The models were retrained at the end of each month to incorporate the latest data. Integrated the pipeline with Airflow for orchestration, and MySQL for structured storage, ensuring a scalable, reliable, and auditable workflows.
View ProjectNSC certificate
Unknown
January 1, 2021 – Present
Cultural Fit Analysis
The candidate's projects demonstrate a strong interest in modern data engineering concepts, particularly around data lakehouses and real-time data processing. The personal nature of all projects suggests self-motivation and a drive to learn and build. The breadth of skills across data ingestion, processing, storage, and ML integration aligns well with a dynamic data engineering environment. However, the lack of team-based project experience or professional roles makes it challenging to fully assess cultural fit in a collaborative work setting.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a proactive and experimental approach to learning new technologies (e.g., Pylceberg for local lakehouse). The focus on automation, reliability, and auditability in pipelines suggests an operational mindset. However, without direct work experience, it's difficult to assess collaboration, stress handling, or communication in a team setting.