remote
Associate III - Data Engineering - UST
Software Engineer
Senior data engineer designing and maintaining scalable big‑data pipelines on Hadoop/Spark, optimizing storage, ensuring data quality and governance, and collaborating with data science teams on analytics and ML initiatives.
About the role
Key Responsibilities
- Design, develop, and maintain robust ETL workflows and big data pipelines using Apache Hadoop and Apache Spark.
- Build and optimize scalable data processing systems, ensuring high performance and reliability.
- Ingest data from diverse sources—databases, APIs, and streaming platforms—into cloud‑based data lakes.
- Implement and enforce data quality, security, and governance standards across the data lifecycle.
- Monitor, troubleshoot, and tune data processing jobs and infrastructure for optimal uptime.
- Collaborate with data scientists and business stakeholders to support analytics and machine learning projects.
Requirements
- Strong experience with Python, SQL, and big‑data frameworks (Hadoop, Spark).
- Proficiency in cloud data platforms (AWS, GCP, or Azure) and data lake architecture.
- Hands‑on knowledge of data ingestion, streaming, and batch processing pipelines.
- Solid understanding of data governance, security, and compliance best practices.
- Excellent problem‑solving skills and ability to work cross‑functionally in a fast‑paced environment.
Skills
pythonapache sparksql