onsite
Software Engineer, Data Infrastructure & Acquisition - Reading, United Kingdom - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, and modern streaming/ETL technologies.
About the role
Key Responsibilities
- Architect and build robust, scalable data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into the data lake.
- Implement ETL workflows using Python, Spark, and SQL to transform and enrich data for downstream analytics and recommendation engines.
- Design and maintain real‑time streaming pipelines with Kafka and AWS services (Kinesis, S3, Redshift) to support low‑latency data delivery.
- Collaborate with cross‑functional teams to define data quality standards, monitoring, and alerting for production workloads.
- Optimize pipeline performance and cost through resource tuning, partitioning strategies, and efficient data storage.
Requirements
- 5+ years of experience building production data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS data services (S3, Redshift, Glue, Kinesis) and streaming platforms like Kafka.
- Solid understanding of data modeling, schema design, and data governance best practices.
- Excellent problem‑solving skills and a passion for building reliable, maintainable data infrastructure.
Skills
pythonawssqlapache sparkkafka