onsite

Software Engineer, Data Infrastructure & Acquisition - San Diego, CA, USA - Speechify

Software Engineer

Build scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and serve large volumes of content across multiple platforms.

About the role

Key Responsibilities

Design, develop, and maintain robust data ingestion pipelines that process PDFs, books, and web content into structured formats for downstream TTS services.
Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to support real‑time and batch processing.
Collaborate with cross‑functional teams to define data models, optimize query performance, and ensure data quality across the platform.
Deploy and manage containerized services on Kubernetes, ensuring high availability and efficient resource utilization.
Monitor pipeline health, troubleshoot issues, and continuously improve system reliability and performance.

Requirements

3+ years of experience building data pipelines in a cloud environment, preferably AWS.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with Kubernetes and container orchestration.
Solid understanding of data modeling, ETL best practices, and performance tuning.
Excellent problem‑solving skills and a collaborative mindset.

Skills

pythonawssqlapache sparkkubernetes

CompanySpeechify

DepartmentEngineering

LocationSan Diego, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026