onsite

Member of Technical Staff - Web Crawl Engineer - REFLECTION

Software Engineer

Lead the design and operation of large‑scale web crawling pipelines, ensuring high‑quality, fresh, and diverse data for AI models using Python, distributed frameworks, and cloud infrastructure.

About the role

Key Responsibilities

Design, implement, and maintain scalable web crawling pipelines that ingest billions of pages daily.
Optimize crawler performance and reliability across distributed clusters on AWS.
Collaborate with data scientists to define data quality metrics and ensure coverage of niche domains.
Integrate extracted data into downstream storage and indexing systems (e.g., S3, Elasticsearch).
Monitor system health, troubleshoot failures, and continuously improve throughput and latency.

Requirements

5+ years of experience building production‑grade web crawlers or large‑scale data ingestion systems.
Proficiency in Python and distributed processing frameworks (e.g., Apache Beam, Spark).
Strong background in cloud infrastructure, especially AWS services such as EC2, S3, and EMR.
Experience with data storage, indexing, and search technologies (Elasticsearch, Solr).
Excellent problem‑solving skills and a passion for data quality and scalability.

Skills

pythonaws

CompanyREFLECTION

DepartmentEngineering

LocationSan Francisco, CA, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 20, 2026