onsite
Software Engineer, Data Infrastructure & Acquisition - Stamford, CT, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and modern data engineering tools to ensure high‑quality, real‑time data availability.
About the role
Key Responsibilities
- Design, build, and maintain robust data ingestion pipelines that process diverse content types (PDFs, books, web pages) into structured formats for downstream services.
- Implement ETL workflows using Python, Apache Spark, and Airflow, ensuring data quality, lineage, and performance at scale.
- Collaborate with product and engineering teams to define data models, schema evolution, and storage strategies on AWS (S3, Redshift, Athena).
- Monitor pipeline health, troubleshoot failures, and optimize resource usage to meet SLAs and cost targets.
- Document architecture, data flows, and best practices for internal knowledge sharing.
Requirements
- 3+ years of experience in data engineering or related roles, with a strong foundation in Python and SQL.
- Hands‑on experience building data pipelines on AWS, using services such as S3, Redshift, Athena, and Glue.
- Proficiency with Spark for large‑scale data processing and Airflow for workflow orchestration.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset in a distributed team environment.
Skills
pythonawssqlapache sparkairflow