onsite

Software Engineer, Data Infrastructure & Acquisition - Austin, TX, USA - Speechify

Software Engineer

Lead the design and scaling of Speechify’s data ingestion and processing pipelines, leveraging Python, AWS, and Spark to transform diverse content into high‑quality audio streams for millions of users.

About the role

Key Responsibilities

Architect and maintain robust data ingestion pipelines that collect and normalize content from PDFs, books, Google Docs, news sites, and web pages.
Implement scalable ETL workflows using Python, SQL, and Apache Spark on AWS to support real‑time and batch processing.
Collaborate with cross‑functional teams to define data models, quality metrics, and performance benchmarks.
Optimize pipeline performance, reduce latency, and ensure high availability through monitoring, alerting, and automated recovery.
Document architecture, code, and best practices for internal use and future onboarding.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services such as S3, Glue, EMR, Lambda, and Redshift.
Solid understanding of data modeling, schema design, and data quality principles.
Excellent problem‑solving skills and a passion for delivering reliable, high‑performance data solutions.

Skills

pythonawssqlapache spark

CompanySpeechify

DepartmentEngineering

LocationAustin, TX, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026