remote
IT Big Data Engineer - Monolithic Power Systems
Data Engineer
Lead the design, development, and maintenance of large-scale data pipelines and analytics solutions using Hadoop, Spark, and AWS services, ensuring high performance, reliability, and scalability for semiconductor industry data.
About the role
Key Responsibilities
- Design, implement, and optimize distributed data pipelines using Hadoop, Spark, and related technologies to ingest, process, and transform large volumes of semiconductor manufacturing data.
- Develop and maintain data models, ETL workflows, and data quality checks in Python and SQL, ensuring data integrity and compliance with industry standards.
- Collaborate with data scientists, product engineers, and DevOps teams to integrate real‑time streaming data via Kafka and other messaging systems into analytics platforms.
- Deploy and manage Big Data workloads on AWS (EMR, S3, Glue, Redshift) with a focus on cost efficiency, scalability, and security.
- Monitor system performance, troubleshoot issues, and implement continuous improvement practices to enhance data processing speed and reliability.
Requirements
- 5+ years of experience in Big Data engineering, with hands‑on expertise in Hadoop, Spark, and related ecosystems.
- Strong programming skills in Python and SQL, plus experience with data modeling and ETL design.
- Proven track record of deploying and managing Big Data solutions on AWS, including EMR, S3, Glue, and Redshift.
- Experience with real‑time data streaming using Kafka or similar platforms.
- Excellent problem‑solving abilities, strong communication skills, and a collaborative mindset.
Skills
hadooppythonsqlawskafka