onsite
Software Engineer, Data Infrastructure & Acquisition - Anchorage, AK, USA - Speechify
Software Engineer
Lead the design and scaling of Speechify’s data ingestion and processing pipelines, leveraging Python, AWS, and Spark to transform diverse content into high‑quality audio streams for millions of users worldwide.
About the role
Key Responsibilities
- Architect and maintain robust data ingestion pipelines that collect and normalize content from PDFs, books, Google Docs, news sites, and web pages.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Lambda) to support real‑time and batch processing.
- Collaborate with cross‑functional teams to define data models, optimize query performance, and ensure data quality across the platform.
- Deploy and manage containerized services on Kubernetes, ensuring high availability and efficient resource utilization.
- Monitor pipeline health, troubleshoot failures, and continuously improve performance and cost efficiency.
Requirements
- 5+ years of experience in data engineering or related roles, with a strong focus on pipeline development.
- Proficiency in Python, SQL, and experience with Spark or similar big‑data frameworks.
- Hands‑on experience with AWS services (S3, Glue, Lambda, Redshift, Athena).
- Solid understanding of container orchestration (Kubernetes) and CI/CD practices.
- Excellent problem‑solving skills and a passion for building reliable, scalable data solutions.
Skills
pythonawssqlapache sparkkubernetes