onsite
Software Engineer, Data Infrastructure & Acquisition - Cardiff, United Kingdom - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s growing text‑to‑speech services, leveraging Python, AWS, and big‑data tools to ingest, transform, and store large volumes of content efficiently.
About the role
Key Responsibilities
- Design, build, and maintain robust data ingestion pipelines that process diverse content types (PDFs, books, web pages) into structured formats for downstream services.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to support real‑time and batch processing.
- Collaborate with cross‑functional teams to define data models, schema evolution, and data quality standards.
- Optimize pipeline performance, monitor throughput, and troubleshoot production issues using monitoring tools and log analysis.
- Drive automation of data workflows and contribute to continuous integration/continuous deployment (CI/CD) pipelines for data infrastructure.
Requirements
- 3+ years of experience building data pipelines in a cloud environment, preferably AWS.
- Hands‑on experience with Kafka or similar streaming platforms for real‑time data ingestion.
- Strong understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset in a distributed team setting.
Skills
pythonawssqlapache sparkkafka