onsite
Software Engineer, Data Infrastructure & Acquisition - West Lafayette, IN, USA - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, and distributed processing tools.
About the role
Key Responsibilities
- Architect, build, and maintain robust data ingestion pipelines that process diverse content types (PDFs, books, web pages) at scale.
- Implement ETL workflows using Python, SQL, and Apache Spark to clean, enrich, and store data in AWS data services.
- Integrate real‑time streaming solutions with Kafka to support low‑latency content updates for the Speechify ecosystem.
- Collaborate with cross‑functional teams to define data models, quality metrics, and performance benchmarks.
- Optimize pipeline performance, troubleshoot bottlenecks, and ensure high availability and fault tolerance.
Requirements
- 5+ years of experience in data engineering or related roles.
- Proficiency in Python, SQL, and distributed processing frameworks (Spark, Flink).
- Hands‑on experience with AWS services (S3, Redshift, Glue, EMR, Kinesis).
- Strong understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a passion for building scalable, reliable data systems.
Skills
pythonawssqlapache sparkkafka