onsite
Software Engineer, Data Infrastructure & Acquisition - Fort Lauderdale, FL, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content data.
About the role
Key Responsibilities
- Design, develop, and maintain robust data ingestion pipelines that process diverse content sources (PDFs, web pages, documents) into structured formats for downstream services.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to support real‑time and batch processing.
- Collaborate with data scientists and product teams to define data models, optimize query performance, and ensure data quality across the platform.
- Monitor pipeline health, troubleshoot failures, and continuously improve reliability and performance through automation and best practices.
- Document architecture, data flows, and operational procedures for internal knowledge sharing.
Requirements
- 5+ years of experience building production‑grade data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS data services (S3, Glue, Redshift, Lambda).
- Solid understanding of data modeling, schema design, and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset in a distributed team setting.
Skills
pythonawssqlapache spark