onsite
Software Engineer, Data Infrastructure & Acquisition - Iowa City, IA, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and Spark to deliver reliable, high‑throughput data infrastructure.
About the role
Key Responsibilities
- Design, build, and maintain end‑to‑end data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into the data lake.
- Implement robust ETL processes using Python, Spark, and SQL to transform and enrich data for downstream analytics and AI models.
- Automate workflow orchestration with Airflow, ensuring high availability, monitoring, and alerting for production pipelines.
- Collaborate with data scientists and product teams to define data schemas, quality metrics, and performance benchmarks.
- Optimize storage and compute costs on AWS (S3, Redshift, EMR) while maintaining data security and compliance.
Requirements
- 5+ years of experience building large‑scale data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Apache Spark.
- Hands‑on experience with AWS services (S3, Redshift, EMR, Glue) and workflow orchestration tools like Airflow.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a passion for building reliable, scalable data infrastructure.
Skills
pythonawssqlapache sparkairflow