onsite
Software Engineer, Data Infrastructure & Acquisition - Menlo Park, CA, USA - Speechify
Software Engineer
Lead the design and maintenance of scalable data pipelines and infrastructure, leveraging Python, AWS, and Spark to ingest, transform, and store large volumes of content for Speechify’s text‑to‑speech platform.
About the role
Key Responsibilities
- Design, build, and optimize data ingestion pipelines that process millions of documents daily, ensuring high throughput and low latency.
- Develop and maintain robust ETL workflows using Python, SQL, and Apache Spark, transforming raw data into analytics‑ready formats.
- Collaborate with cross‑functional teams to define data models, schema, and metadata standards for content, user behavior, and performance metrics.
- Implement and manage AWS services (S3, Redshift, Glue, Lambda) to support scalable storage, processing, and serverless workloads.
- Monitor pipeline health, troubleshoot failures, and continuously improve reliability and cost efficiency.
Requirements
- 5+ years of experience in data engineering or related roles, with a strong focus on large‑scale pipeline development.
- Proficiency in Python, SQL, and Spark for data processing and transformation.
- Hands‑on experience with AWS data services (S3, Redshift, Glue, Lambda, EMR).
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and a collaborative mindset in a distributed team environment.
Skills
pythonawssqlapache spark