onsite

Software Engineer, Data Infrastructure & Acquisition - Albuquerque, NM, USA - Speechify

Software Engineer

Lead the design and maintenance of scalable data pipelines that ingest, transform, and store content for a global text‑to‑speech platform, leveraging Python, AWS, and Spark to ensure high‑quality, real‑time data availability.

About the role

Key Responsibilities

Architect and implement robust data ingestion pipelines that process diverse content types (PDFs, books, web pages) into structured formats for downstream services.
Optimize ETL workflows using Python, SQL, and Apache Spark to handle petabyte‑scale datasets with minimal latency.
Collaborate with cross‑functional teams to define data models, schema evolution, and data quality standards.
Deploy and maintain data infrastructure on AWS (S3, Redshift, Glue, Lambda) ensuring high availability and cost efficiency.
Monitor pipeline performance, troubleshoot issues, and implement automated alerts and recovery mechanisms.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services (S3, Redshift, Glue, Lambda, Athena).
Solid understanding of data modeling, schema design, and data quality best practices.
Excellent problem‑solving skills and a passion for building reliable, scalable systems.

Skills

pythonawssqlapache spark

CompanySpeechify

DepartmentEngineering

LocationAlbuquerque, NM, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026