onsite
Software Engineer, Data Infrastructure & Acquisition - Belfast, United Kingdom - Speechify
Software Engineer
Lead the design and maintenance of scalable data pipelines and infrastructure, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform.
About the role
Key Responsibilities
- Design, build, and optimize end‑to‑end data pipelines that ingest PDFs, books, and web content into the Speechify ecosystem.
- Implement robust ETL processes using Python and Spark, ensuring data quality and consistency across services.
- Deploy and manage infrastructure on AWS (EC2, S3, Redshift, Glue) with CI/CD pipelines and infrastructure-as-code.
- Collaborate with data scientists and product teams to expose clean, high‑performance datasets for model training and analytics.
- Monitor pipeline health, troubleshoot failures, and continuously improve performance and cost efficiency.
Requirements
- 3+ years of experience in data engineering or related roles.
- Strong proficiency in Python, SQL, and Apache Spark.
- Hands‑on experience with AWS services (S3, Redshift, Glue, Lambda).
- Solid understanding of ETL best practices and data modeling.
- Excellent problem‑solving skills and a passion for building scalable, reliable systems.
Skills
pythonawssqlapache spark