onsite
Software Engineer, Data Infrastructure & Acquisition - Charlotte, NC, USA - Speechify
Software Engineer
Lead the design and scaling of Speechify’s data ingestion and processing pipelines, leveraging Python, AWS, and Spark to transform diverse content into high‑quality audio streams for millions of users.
About the role
Key Responsibilities
- Architect and maintain robust data pipelines that ingest PDFs, books, web pages, and other text sources into the Speechify platform.
- Implement scalable ETL workflows using Python, SQL, and Apache Spark on AWS services (S3, Glue, Redshift).
- Collaborate with product and ML teams to ensure data quality, consistency, and availability for downstream text‑to‑speech models.
- Optimize pipeline performance, monitor throughput, and troubleshoot production incidents.
- Document architecture, data schemas, and best practices for the engineering team.
Requirements
- 5+ years of experience building data infrastructure in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS services such as S3, Glue, Redshift, and Lambda.
- Solid understanding of ETL concepts, data modeling, and performance tuning.
- Excellent communication skills and a collaborative mindset.
Skills
pythonawssqlapache spark