onsite
Big Data Developer
Software Engineer
Design, develop, and optimize large‑scale data pipelines using Amazon EMR, Apache Hive, Spark, and Azure Databricks, delivering high‑performance analytics solutions for complex business needs.
About the role
Key Responsibilities
- Design and implement end‑to‑end data pipelines on Amazon EMR and Azure Databricks, leveraging Apache Hive and Spark for batch and streaming workloads.
- Develop, tune, and maintain SQL queries and data models to support analytics and reporting requirements.
- Collaborate with data scientists and product teams in an Agile environment to translate business requirements into scalable data solutions.
- Monitor, troubleshoot, and optimize performance of big‑data jobs, ensuring reliability and cost‑efficiency.
- Implement data governance, security, and quality standards across all data processing layers.
Requirements
- 3+ years of hands‑on experience with Amazon EMR, Apache Hive, and Apache Spark.
- Proficiency in Azure Databricks and cloud‑based data platform services.
- Strong SQL skills and experience with Python (or Scala) for data transformation.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Experience working in Agile teams and using version control/Git workflows.
Skills
apache sparksqlpython