onsite
Senior Data Engineer - Data Science Platform
Data Scientist
Senior Data Engineer responsible for designing, building, and optimizing large‑scale data pipelines on Azure, leveraging Apache Spark and Azure Data Factory to support a data science platform.
About the role
Key Responsibilities
- Design and implement end‑to‑end data pipelines using Apache Spark and Azure Data Factory to ingest, transform, and store massive datasets.
- Develop and maintain data lake architectures on Azure Data Lake and Azure Data Lake Storage, ensuring high availability and security.
- Collaborate with data scientists and analysts to provide reliable, well‑documented data sources for machine‑learning models and analytics.
- Optimize performance and cost of data processing jobs, monitoring workloads and tuning Spark configurations.
- Implement data governance, lineage, and quality checks across the platform.
Requirements
- 5+ years of experience building data pipelines on Azure, with deep expertise in Apache Spark.
- Proficiency in Azure Data Factory, Azure Data Lake, and Azure Data Lake Storage services.
- Strong SQL and programming skills (Python/Scala) for data transformation and automation.
- Experience with data modeling, ETL best practices, and performance tuning in distributed environments.
- Solid understanding of data security, governance, and CI/CD for data engineering workflows.