onsite
Software Engineer, Data Infrastructure & Acquisition - Hyderabad, India - Speechify
Software Engineer
Build scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, Spark, and SQL to ensure high availability and performance.
About the role
Key Responsibilities
- Design, develop, and maintain robust data ingestion pipelines that process raw content from diverse sources (PDFs, web pages, documents) into structured formats for downstream services.
- Implement ETL workflows using Python, Spark, and AWS services (S3, Glue, Redshift) to transform and enrich data at scale.
- Collaborate with data scientists and product teams to define data models, schema, and quality metrics that support real‑time analytics and recommendation engines.
- Monitor pipeline performance, troubleshoot failures, and optimize throughput and cost across the data stack.
- Document architecture, code, and operational procedures to enable seamless knowledge transfer within a distributed team.
Requirements
- 3+ years of experience building production data pipelines in a cloud environment.
- Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS services such as S3, Glue, Redshift, and Lambda.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent communication skills and ability to thrive in a fully remote, cross‑functional team.