remote
Staff Data Engineer - RBC
Data Engineer
Senior data engineer leading the design and implementation of real‑time, distributed data pipelines and platforms using Python, Spark, Kafka, Airflow and cloud services to enable enterprise‑wide analytics and decision making.
About the role
Key Responsibilities
- Architect, develop, and maintain scalable, real‑time data pipelines and data lake solutions on cloud infrastructure.
- Design and implement streaming workflows using Apache Kafka and batch processing with Apache Spark.
- Build, schedule, and monitor ETL orchestration using Apache Airflow, ensuring data quality and reliability.
- Collaborate with data scientists, analysts, and product owners to translate business requirements into robust data models and APIs.
- Optimize SQL queries and data storage patterns for performance and cost efficiency.
- Mentor junior engineers and promote best practices in code review, testing, and documentation.
Requirements
- 5+ years of professional experience building large‑scale data pipelines in Python or Scala.
- Deep expertise with Apache Spark (Structured Streaming or batch) and Kafka for high‑throughput data processing.
- Strong SQL skills and experience with relational and columnar data stores (e.g., PostgreSQL, Snowflake, Redshift).
- Hands‑on experience with workflow orchestration tools such as Apache Airflow.
- Proficiency in cloud platforms (AWS) including services like S3, EMR, Kinesis, and IAM.
Skills
pythonapache sparkkafkasqlairflow