onsite

Software Engineer, Data Infrastructure & Acquisition - Calgary, Canada - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content data.

About the role

Key Responsibilities

Design, build, and maintain robust data ingestion pipelines that process PDFs, books, and web content into structured formats for downstream TTS services.
Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift).
Optimize data storage and query performance with SQL and columnar databases, ensuring low latency for real‑time content delivery.
Collaborate with cross‑functional teams to define data models, schema evolution, and data quality standards.
Automate deployment and monitoring of data infrastructure using Docker, Kubernetes, and CI/CD pipelines.

Requirements

5+ years of experience in data engineering or related roles, with a strong background in Python and SQL.
Proven expertise in building large‑scale data pipelines on AWS, including Glue, Redshift, and S3.
Hands‑on experience with Apache Spark for distributed data processing.
Familiarity with containerization (Docker) and orchestration (Kubernetes) for data workloads.
Excellent problem‑solving skills and a passion for building reliable, high‑performance data systems.

Skills

pythonawssqlapache sparkdocker

CompanySpeechify

DepartmentEngineering

LocationCalgary, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026