onsite
Software Engineer, Data Infrastructure & Acquisition - San Diego, CA, USA - Speechify
Software Engineer
Build scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and serve large volumes of content across multiple platforms.
About the role
Key Responsibilities
- Design, develop, and maintain robust data ingestion pipelines that process PDFs, books, and web content into structured formats for downstream TTS services.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to support real‑time and batch processing.
- Collaborate with cross‑functional teams to define data models, optimize query performance, and ensure data quality across the platform.
- Deploy and manage containerized services on Kubernetes, ensuring high availability and efficient resource utilization.
- Monitor pipeline health, troubleshoot issues, and continuously improve system reliability and performance.
Requirements
- 3+ years of experience building data pipelines in a cloud environment, preferably AWS.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with Kubernetes and container orchestration.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset.
Skills
pythonawssqlapache sparkkubernetes