onsite
Data Engineer - Redhorse International
Data Engineer
Data Engineer with end‑to‑end pipeline experience, skilled in Python, Spark, Kafka and graph databases, building scalable, highly available data solutions on AWS.
About the role
Key Responsibilities
- Design, develop, and maintain robust data pipelines that ingest, transform, and load large‑scale datasets from diverse sources.
- Implement and optimise graph data models using Neo4j or similar technologies to enable advanced relationship analytics.
- Deploy and manage streaming solutions with Kafka to ensure real‑time data availability.
- Leverage Apache Spark for batch and micro‑batch processing, ensuring performance and scalability.
- Collaborate with data scientists and analysts to provide clean, well‑documented data sets for downstream analytics and machine‑learning workloads.
Requirements
- 3+ years of hands‑on experience in full data‑engineering lifecycle, including pipeline design, ETL/ELT, and orchestration.
- Proficiency in Python and SQL, with strong knowledge of Spark (PySpark or Scala) and Kafka.
- Experience building and querying graph databases such as Neo4j.
- Solid understanding of cloud platforms, preferably AWS (S3, EMR, Lambda, Glue, etc.).
- Ability to work in highly available, production‑grade environments and troubleshoot performance issues.
Skills
pythonapache sparkkafkasqlneo4jaws