onsite

Software Engineer, Data Infrastructure & Acquisition - Oxford, United Kingdom - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for real‑time text‑to‑speech services, leveraging Python, AWS, and Spark to power Speechify’s global platform.

About the role

Key Responsibilities

Architect and develop robust, high‑throughput data pipelines to ingest diverse content sources (PDFs, web pages, documents) into the data lake.
Implement ETL workflows using Python, Spark, and SQL, ensuring data quality, lineage, and compliance with privacy standards.
Collaborate with product and ML teams to expose clean, enriched datasets for downstream speech synthesis models.
Optimize pipeline performance and cost on AWS (S3, Glue, EMR, Redshift) and containerize services with Docker and Kubernetes.
Monitor, troubleshoot, and continuously improve pipeline reliability using observability tools.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services (S3, Glue, EMR, Redshift, Lambda).
Solid understanding of data modeling, ETL best practices, and data governance.
Excellent problem‑solving skills and a passion for building scalable, maintainable systems.

Skills

pythonawssqlapache sparkdocker

CompanySpeechify

DepartmentEngineering

LocationOxford, ENG, United Kingdom

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026