onsite

Software Engineer, Data Infrastructure & Acquisition - Reading, United Kingdom - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, and modern streaming/ETL technologies.

About the role

Key Responsibilities

Architect and build robust, scalable data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into the data lake.
Implement ETL workflows using Python, Spark, and SQL to transform and enrich data for downstream analytics and recommendation engines.
Design and maintain real‑time streaming pipelines with Kafka and AWS services (Kinesis, S3, Redshift) to support low‑latency data delivery.
Collaborate with cross‑functional teams to define data quality standards, monitoring, and alerting for production workloads.
Optimize pipeline performance and cost through resource tuning, partitioning strategies, and efficient data storage.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS data services (S3, Redshift, Glue, Kinesis) and streaming platforms like Kafka.
Solid understanding of data modeling, schema design, and data governance best practices.
Excellent problem‑solving skills and a passion for building reliable, maintainable data infrastructure.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationReading, ENG, United Kingdom

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026