onsite

Software Engineer, Data Infrastructure & Acquisition - Hyderabad, India - Speechify

Software Engineer

Build scalable data pipelines and infrastructure to ingest, transform, and serve large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, Spark, and SQL to ensure high availability and performance.

About the role

Key Responsibilities

Design, develop, and maintain robust data ingestion pipelines that process raw content from diverse sources (PDFs, web pages, documents) into structured formats for downstream services.
Implement ETL workflows using Python, Spark, and AWS services (S3, Glue, Redshift) to transform and enrich data at scale.
Collaborate with data scientists and product teams to define data models, schema, and quality metrics that support real‑time analytics and recommendation engines.
Monitor pipeline performance, troubleshoot failures, and optimize throughput and cost across the data stack.
Document architecture, code, and operational procedures to enable seamless knowledge transfer within a distributed team.

Requirements

3+ years of experience building production data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and Spark for large‑scale data processing.
Hands‑on experience with AWS services such as S3, Glue, Redshift, and Lambda.
Solid understanding of data modeling, ETL best practices, and performance tuning.
Excellent communication skills and ability to thrive in a fully remote, cross‑functional team.

Skills

pythonawssql

CompanySpeechify

DepartmentEngineering

LocationTelangana, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026