onsite
Software Engineer, Data Infrastructure & Acquisition - Kochi, India - Speechify
Software Engineer
Lead the design and implementation of scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for Speechify’s text‑to‑speech platform, leveraging Python, AWS, and modern streaming/ETL technologies.
About the role
Key Responsibilities
- Design, build, and maintain robust data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into a unified data lake.
- Implement ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift) to transform and enrich data for downstream consumption.
- Develop real‑time streaming solutions with Kafka to support low‑latency content updates and feature flagging.
- Collaborate with data scientists and product teams to expose clean, high‑quality datasets via SQL interfaces and APIs.
- Monitor pipeline performance, troubleshoot failures, and continuously optimize for cost and speed.
Requirements
- 3+ years of experience building production‑grade data pipelines in a cloud environment.
- Excellent problem‑solving skills and a passion for clean, maintainable code.
- Effective communication skills and ability to work in a fully distributed team.
Skills
pythonawssqlapache sparkkafka