onsite
Software Engineer, Data Infrastructure & Acquisition - Rochester, NY, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and Spark to deliver reliable, high‑throughput data infrastructure.
About the role
Key Responsibilities
- Design, build, and maintain end‑to‑end data pipelines that ingest PDFs, books, and web content into the Speechify ecosystem.
- Implement robust ETL processes using Python, SQL, and Apache Spark to transform raw data into analytics‑ready formats.
- Deploy and manage pipeline components on AWS (S3, Glue, Redshift, Lambda) ensuring high availability and scalability.
- Collaborate with data scientists and product teams to define data models and optimize query performance.
- Monitor pipeline health, troubleshoot failures, and continuously improve reliability and performance.
Requirements
- 5+ years of experience building production data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS services (S3, Glue, Redshift, Lambda, ECS/EKS).
- Solid understanding of Docker, Kubernetes, and CI/CD practices for data workloads.
- Excellent problem‑solving skills and a passion for building scalable, maintainable systems.
Skills
pythonawssqlapache sparkdocker