onsite

Software Engineer, Data Infrastructure & Acquisition - Eugene, OR, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content data.

About the role

Key Responsibilities

Design, build, and maintain robust data pipelines that ingest content from diverse sources (PDFs, web pages, documents) into the data lake.
Implement ETL processes using Python and Apache Spark to clean, enrich, and transform raw data for downstream analytics and model training.
Collaborate with data scientists and product teams to define data schemas, quality metrics, and performance benchmarks.
Optimize pipeline performance and cost on AWS (S3, Glue, Redshift, Athena) while ensuring high availability and fault tolerance.
Monitor, troubleshoot, and continuously improve data workflows, implementing automated alerts and logging.

Requirements

3+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services (S3, Glue, Redshift, Athena, Lambda).
Solid understanding of data modeling, ETL best practices, and data quality principles.
Excellent problem‑solving skills and a collaborative mindset in a distributed team setting.

Skills

pythonawssqlapache spark

CompanySpeechify

DepartmentEngineering

LocationEugene, OR, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026