onsite
Software Engineer, Data Infrastructure & Acquisition - Calgary, Canada - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to support Speechify’s text‑to‑speech services, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content data.
About the role
Key Responsibilities
- Design, build, and maintain robust data ingestion pipelines that process PDFs, books, and web content into structured formats for downstream TTS services.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift).
- Optimize data storage and query performance with SQL and columnar databases, ensuring low latency for real‑time content delivery.
- Collaborate with cross‑functional teams to define data models, schema evolution, and data quality standards.
- Automate deployment and monitoring of data infrastructure using Docker, Kubernetes, and CI/CD pipelines.
Requirements
- 5+ years of experience in data engineering or related roles, with a strong background in Python and SQL.
- Proven expertise in building large‑scale data pipelines on AWS, including Glue, Redshift, and S3.
- Hands‑on experience with Apache Spark for distributed data processing.
- Familiarity with containerization (Docker) and orchestration (Kubernetes) for data workloads.
- Excellent problem‑solving skills and a passion for building reliable, high‑performance data systems.
Skills
pythonawssqlapache sparkdocker