onsite

Software Engineer, Data Infrastructure & Acquisition - Ann Arbor, MI, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for text‑to‑speech services, leveraging Python, AWS, and distributed processing frameworks to ensure high availability and performance.

About the role

Key Responsibilities

Architect and build robust, fault‑tolerant data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into a unified data lake.
Implement ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to transform and enrich data for downstream consumption.
Integrate real‑time streaming data with Kafka and Kinesis to support live content ingestion and analytics.
Collaborate with cross‑functional teams to define data models, schema evolution, and metadata management.
Monitor pipeline performance, troubleshoot issues, and continuously optimize for cost and latency.

Requirements

5+ years of experience building production‑grade data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and distributed processing frameworks (Spark, Flink).
Hands‑on experience with AWS data services (S3, Glue, Redshift, Athena) and streaming platforms (Kafka, Kinesis).
Solid understanding of data modeling, schema design, and data quality best practices.
Excellent problem‑solving skills and a passion for building scalable, maintainable systems.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationAnn Arbor, MI, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026