onsite
Software Engineer, Data Infrastructure & Acquisition - Sydney, Australia - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content data.
About the role
Key Responsibilities
- Design, build, and maintain robust data ingestion pipelines that process PDFs, books, and web content into structured formats for downstream TTS services.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift).
- Collaborate with data scientists and product teams to define data models, schema, and quality metrics.
- Optimize pipeline performance, monitor job health, and troubleshoot production issues.
- Automate deployment and scaling of data services with Docker and Kubernetes.
Requirements
- 3+ years of experience in data engineering or related role.
Skills
pythonawssqlapache sparkdocker