onsite

Software Engineer, Data Infrastructure & Acquisition - Santa Clara, CA, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to ingest, process, and serve large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and distributed data tools.

About the role

Key Responsibilities

Architect, build, and maintain robust data ingestion pipelines that transform raw content from PDFs, books, and web sources into structured formats for downstream services.
Collaborate with cross‑functional teams to define data models, schema, and quality standards ensuring high reliability and performance.
Optimize and scale batch and streaming workflows using Apache Spark, Kafka, and AWS services (S3, Glue, Redshift).
Implement monitoring, alerting, and automated testing to guarantee pipeline uptime and data integrity.
Drive continuous improvement by evaluating new technologies, refactoring legacy code, and sharing best practices across the engineering organization.

Requirements

5+ years of experience building production‑grade data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and distributed processing frameworks (Spark, Flink).
Hands‑on experience with AWS data services (S3, Glue, Redshift, Athena) and streaming platforms (Kafka, Kinesis).
Solid understanding of data modeling, ETL best practices, and performance tuning.
Excellent problem‑solving skills, ability to work independently in a distributed team, and a passion for clean, maintainable code.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationSanta Clara, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026