onsite

Software Engineer, Data Infrastructure & Acquisition - Columbus, OH, USA - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines that ingest, transform, and store large volumes of content for real‑time text‑to‑speech services, leveraging Python, AWS, and distributed processing frameworks.

About the role

Key Responsibilities

Architect and build robust, fault‑tolerant data pipelines to ingest PDFs, books, and web content at scale.
Implement ETL workflows using Python, SQL, and Apache Spark on AWS services (S3, Glue, Redshift).
Integrate streaming data sources with Kafka to support real‑time content processing.
Collaborate with product and ML teams to expose clean, high‑quality datasets for downstream services.
Monitor pipeline performance, troubleshoot issues, and continuously optimize throughput and cost.

Requirements

5+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and distributed data processing (Spark, Flink).
Hands‑on experience with AWS data services (S3, Glue, Redshift, EMR).
Solid understanding of Kafka or similar streaming platforms.
Excellent problem‑solving skills and a passion for clean, maintainable code.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationColumbus, OH, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary200,000

Posted June 21, 2026