onsite
Software Engineer, Data Infrastructure & Acquisition - Grand Rapids, MI, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and distributed processing tools to ensure high‑quality, real‑time data availability.
About the role
Key Responsibilities
- Design, build, and maintain robust data ingestion pipelines that process PDFs, books, web pages, and other content sources into structured formats for downstream services.
- Implement ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to transform raw data into analytics‑ready datasets.
- Integrate streaming data sources with Kafka to support real‑time content updates and feature flagging.
- Collaborate with data scientists and product teams to define data models, schema evolution, and performance tuning.
- Monitor pipeline health, troubleshoot failures, and optimize throughput and cost across the data stack.
Requirements
- 3+ years of experience building production‑grade data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and distributed processing frameworks such as Spark.
- Hands‑on experience with AWS data services (S3, Glue, Redshift, Athena) and Kafka or similar streaming platforms.
- Solid understanding of data modeling, schema design, and performance optimization.
- Excellent problem‑solving skills and a collaborative mindset in a distributed team setting.
Skills
pythonawssqlapache sparkkafka