onsite

Software Engineer, Data Infrastructure & Acquisition - Chennai, India - Speechify

Software Engineer

Lead the design and implementation of scalable data pipelines and infrastructure to ingest, process, and store large volumes of content for a global text‑to‑speech platform, leveraging Python, AWS, and distributed data tools.

About the role

Key Responsibilities

Architect and build robust, fault‑tolerant data pipelines that ingest raw content from diverse sources (PDFs, web pages, documents) into the platform’s data lake.
Develop and maintain ETL workflows using Python, Apache Spark, and AWS services (S3, Glue, Redshift, Athena) to transform and enrich data for downstream analytics and ML models.
Implement real‑time streaming ingestion with Kafka, ensuring low latency and high throughput for content updates.
Collaborate with data scientists and product teams to define data schemas, quality metrics, and performance benchmarks.
Optimize query performance and storage costs through partitioning, indexing, and cost‑effective data lake design.
Monitor pipeline health, troubleshoot failures, and continuously improve reliability and scalability.

Requirements

5+ years of experience building production‑grade data pipelines in a cloud environment.
Strong proficiency in Python, SQL, and distributed data processing frameworks (Spark, Flink).
Hands‑on experience with AWS data services (S3, Glue, Redshift, Athena, Kinesis).
Solid understanding of streaming architectures and Kafka.
Excellent problem‑solving skills and a passion for clean, maintainable code.

Skills

pythonawssqlapache sparkkafka

CompanySpeechify

DepartmentEngineering

LocationTamil Nadu, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026