onsite
Software Engineer, Data Infrastructure & Acquisition - Albuquerque, NM, USA - Speechify
Software Engineer
Lead the design and maintenance of scalable data pipelines that ingest, transform, and store content for a global text‑to‑speech platform, leveraging Python, AWS, and Spark to ensure high‑quality, real‑time data availability.
About the role
Key Responsibilities
- Architect and implement robust data ingestion pipelines that process diverse content types (PDFs, books, web pages) into structured formats for downstream services.
- Optimize ETL workflows using Python, SQL, and Apache Spark to handle petabyte‑scale datasets with minimal latency.
- Collaborate with cross‑functional teams to define data models, schema evolution, and data quality standards.
- Deploy and maintain data infrastructure on AWS (S3, Redshift, Glue, Lambda) ensuring high availability and cost efficiency.
- Monitor pipeline performance, troubleshoot issues, and implement automated alerts and recovery mechanisms.
Requirements
- 5+ years of experience building production data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS services (S3, Redshift, Glue, Lambda, Athena).
- Solid understanding of data modeling, schema design, and data quality best practices.
- Excellent problem‑solving skills and a passion for building reliable, scalable systems.
Skills
pythonawssqlapache spark