onsite
Data Engineer - P2PSoftTek Inc
Data Engineer
Data Engineer responsible for designing, building, and maintaining scalable ETL/ELT pipelines with Spark, managing data lake solutions on S3, and implementing CDC workflows for Redshift and Snowflake environments.
About the role
Key Responsibilities
- Design, develop, and maintain high‑performance ETL/ELT pipelines using Apache Spark (PySpark or Scala).
- Construct and operate data lake architectures on Amazon S3, leveraging Parquet and Iceberg file formats for optimal storage and query efficiency.
- Implement Change Data Capture (CDC) pipelines to deliver real‑time or near‑real‑time data feeds into Amazon Redshift and Snowflake warehouses.
- Collaborate with data analysts and scientists to understand data requirements and translate them into robust data models.
- Monitor, troubleshoot, and optimize pipeline performance, ensuring data quality, reliability, and scalability.
Requirements
- Strong experience with Apache Spark, including hands‑on development in PySpark or Scala.
- Proficiency in building data lakes on Amazon S3 and working with Parquet/Iceberg formats.
- Hands‑on experience with Amazon Redshift and Snowflake, including data loading and performance tuning.
- Knowledge of CDC techniques and tools for streaming or batch data replication.
- Solid understanding of SQL, data modeling, and best practices for ETL/ELT pipeline design.
Skills
apache sparkscalasnowflake