remote
Web Scraping Engineer - Powerhouse Institute, Inc
Software Engineer
Develop and maintain scalable web scraping solutions, extract data from diverse sites and APIs, ensure data quality, and optimize pipelines using Python, Scrapy, Selenium, and cloud services.
About the role
Key Responsibilities
- Design, implement, and maintain robust web scraping scripts and frameworks to collect data from a wide range of websites and APIs.
- Ensure data integrity and accuracy through validation, cleaning, and transformation processes.
- Optimize scraping performance and resource utilization, including parallelization and headless browser management.
- Integrate scraped data into storage solutions such as relational databases, data lakes, or cloud services (e.g., AWS S3, RDS).
- Monitor, troubleshoot, and resolve issues in production scraping pipelines, implementing logging and alerting mechanisms.
Requirements
- Strong proficiency in Python with hands‑on experience using Scrapy, Selenium, or similar scraping libraries.
- Solid understanding of HTTP, HTML, CSS selectors, and RESTful API consumption.
- Experience deploying and scaling data pipelines on cloud platforms, preferably AWS.
- Proficiency with SQL databases and data modeling for storing extracted information.
- U.S. citizenship and ability to obtain or maintain a public‑trust security clearance.
Skills
pythonseleniumawssql