onsite
Software Engineer, Data Infrastructure & Acquisition - Austin, TX, USA - Speechify
Software Engineer
Lead the design and scaling of Speechify’s data ingestion and processing pipelines, leveraging Python, AWS, and Spark to transform diverse content into high‑quality audio streams for millions of users.
About the role
Key Responsibilities
- Architect and maintain robust data ingestion pipelines that collect and normalize content from PDFs, books, Google Docs, news sites, and web pages.
- Implement scalable ETL workflows using Python, SQL, and Apache Spark on AWS to support real‑time and batch processing.
- Collaborate with cross‑functional teams to define data models, quality metrics, and performance benchmarks.
- Optimize pipeline performance, reduce latency, and ensure high availability through monitoring, alerting, and automated recovery.
- Document architecture, code, and best practices for internal use and future onboarding.
Requirements
- 5+ years of experience building production data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS services such as S3, Glue, EMR, Lambda, and Redshift.
- Solid understanding of data modeling, schema design, and data quality principles.
- Excellent problem‑solving skills and a passion for delivering reliable, high‑performance data solutions.
Skills
pythonawssqlapache spark