remote
Senior Software Engineer, Data Engineering - GitHub
Software Engineer
Senior Data Engineering software engineer responsible for designing, building, and scaling robust data pipelines and platforms using Python, Spark, Kafka, and AWS to enable analytics and machine‑learning workloads.
About the role
Key Responsibilities
- Design, develop, and maintain high‑performance data pipelines that ingest, transform, and store large‑scale event streams.
- Collaborate with product, analytics, and machine‑learning teams to define data models and schema evolution strategies.
- Implement and optimize ETL processes using Python, SQL, and Apache Spark on AWS services such as EMR, S3, and Redshift.
- Build reliable streaming solutions with Kafka, ensuring low latency and fault tolerance.
- Drive best practices for data quality, monitoring, and observability across the data platform.
Requirements
- 5+ years of professional experience in data engineering or backend software development.
- Strong proficiency in Python, SQL, and distributed processing frameworks (e.g., Apache Spark).
- Hands‑on experience with streaming technologies such as Kafka and cloud platforms, preferably AWS.
- Demonstrated ability to design scalable data models and build end‑to‑end ETL pipelines.
- Excellent problem‑solving skills and ability to work autonomously in a remote, collaborative environment.
Skills
pythonsqlapache sparkkafkaaws