onsite

Software Engineer, Data Infrastructure & Acquisition - Rochester, NY, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and Spark to deliver reliable, high‑throughput data infrastructure.

About the role

Key Responsibilities

Design, build, and maintain end‑to‑end data pipelines that ingest PDFs, books, and web content into the Speechify ecosystem.
Implement robust ETL processes using Python, SQL, and Apache Spark to transform raw data into analytics‑ready formats.
Deploy and manage pipeline components on AWS (S3, Glue, Redshift, Lambda) ensuring high availability and scalability.
Collaborate with data scientists and product teams to define data models and optimize query performance.
Monitor pipeline health, troubleshoot failures, and continuously improve reliability and performance.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services (S3, Glue, Redshift, Lambda, ECS/EKS).
Solid understanding of Docker, Kubernetes, and CI/CD practices for data workloads.
Excellent problem‑solving skills and a passion for building scalable, maintainable systems.

Skills

pythonawssqlapache sparkdocker

CompanySpeechify

DepartmentEngineering

LocationRochester, NY, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026