onsite

Software Engineer, Data Infrastructure & Acquisition - Menlo Park, CA, USA - Speechify

Software Engineer

Lead the design and maintenance of scalable data pipelines and infrastructure, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform.

About the role

Key Responsibilities

Design, build, and optimize data ingestion pipelines that process millions of documents daily, ensuring high throughput and low latency.
Develop and maintain robust ETL workflows using Python, SQL, and Apache Spark, transforming raw data into analytics‑ready formats.
Collaborate with cross‑functional teams to define data models, schema, and metadata standards for content, user behavior, and performance metrics.
Implement and manage AWS services (S3, Redshift, Glue, Lambda) to support scalable storage, processing, and serverless workloads.
Monitor pipeline health, troubleshoot failures, and continuously improve reliability and cost efficiency.

Requirements

5+ years of experience in data engineering or related roles, with a strong focus on large‑scale pipeline development.
Proficiency in Python, SQL, and Spark for data processing and transformation.
Hands‑on experience with AWS data services (S3, Redshift, Glue, Lambda, EMR).
Solid understanding of data modeling, ETL best practices, and performance tuning.
Excellent problem‑solving skills and a collaborative mindset in a distributed team environment.

Skills

pythonawssqlapache spark

CompanySpeechify

DepartmentEngineering

LocationMenlo Park, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026