onsite

Software Engineer, Data Infrastructure & Acquisition - Phoenix, AZ, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, and distributed processing tools.

About the role

Key Responsibilities

Architect and build robust, scalable data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into the data lake.
Implement ETL workflows using Python, Spark, and SQL to clean, enrich, and transform data for downstream analytics and model training.
Collaborate with data scientists and product teams to define data schemas, quality metrics, and performance benchmarks.
Optimize pipeline performance and cost on AWS (S3, Glue, Redshift, EMR) while ensuring high availability and fault tolerance.
Monitor, troubleshoot, and continuously improve data ingestion, processing, and storage solutions.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and distributed processing frameworks (Spark, Flink).
Hands‑on experience with AWS services (S3, Glue, Redshift, EMR, Lambda).
Solid understanding of data modeling, ETL best practices, and data quality principles.
Excellent problem‑solving skills and a passion for building reliable, scalable data infrastructure.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationPhoenix, AZ, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026