onsite

Software Engineer, Data Infrastructure & Acquisition - Boston, MA, USA - Speechify

Software Engineer

Lead the design and scaling of Speechify’s data ingestion and processing pipelines, leveraging Python, AWS, Spark, and SQL to ensure reliable, high‑throughput data flows for our text‑to‑speech services.

About the role

Key Responsibilities

Architect, build, and maintain scalable data pipelines that ingest, transform, and store large volumes of text and metadata from diverse sources (PDFs, web pages, documents).
Implement robust ETL workflows using Python, Spark, and AWS services (S3, Glue, Redshift, Lambda) to support real‑time and batch processing.
Collaborate with data scientists and product teams to define data models, optimize query performance, and ensure data quality across the platform.
Monitor pipeline health, troubleshoot failures, and continuously improve reliability and latency.
Document architecture, processes, and best practices for internal use and future onboarding.

Requirements

3+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services (S3, Glue, Redshift, Lambda, Kinesis).
Solid understanding of data modeling, ETL best practices, and performance tuning.
Excellent problem‑solving skills and a collaborative mindset.

Skills

pythonawssql

CompanySpeechify

DepartmentEngineering

LocationBoston, MA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026