onsite
Databricks Developer / Data Architect
Software Engineer
Design, develop, and optimize end‑to‑end data pipelines on Databricks and Spark, leveraging AWS and Azure services while orchestrating workflows with Apache Airflow.
About the role
Key Responsibilities
- Design and implement scalable data pipelines and lakehouse solutions using Databricks and Apache Spark.
- Develop, schedule, and monitor workflow orchestration with Apache Airflow in cloud environments.
- Integrate data sources from AWS (S3, Redshift, Glue) and Azure (Data Lake, Synapse) into unified analytics platforms.
- Collaborate with data scientists and analysts to provide clean, well‑documented datasets for machine‑learning and reporting.
- Optimize performance, cost, and reliability of data processing jobs through tuning, caching, and resource management.
Requirements
- 3+ years of hands‑on experience with Databricks, Apache Spark, and Airflow.
- Strong proficiency in cloud platforms, specifically AWS and Azure services related to data engineering.
- Solid understanding of data modeling, lakehouse architecture, and ETL best practices.
- Proficiency in SQL and at least one programming language such as Python or Scala.
- Experience with CI/CD pipelines and infrastructure‑as‑code tools (e.g., Terraform, CloudFormation) is a plus.
Skills
databricksapache sparkawsazure