Data Engineer
Senior engineer responsible for designing, building, and operating fully automated threat intelligence collection systems, ingesting millions of artifacts daily from surface and deep web sources to feed AI-driven cybersecurity platforms.
About the Role : HEROIC Cybersecurity ( HEROIC .com ) is seeking a senior-level Threat Intelligence Data Engineer - Automated Collection & Dark Web Intelligence to design, build, and operatefully automated intelligence collection systemsthat power our AI-driven cybersecurity and breach intelligence platforms.
This role owns theend-to-end discovery, acquisition, and ingestion pipelinefor continuously discovering, crawling, extracting, indexing, and normalizingmillions of new artifacts daily—including documents, chats, forums, leaked datasets, repositories, threat actor communications, hacker marketplaces, unsecured infrastructure, and decentralized networks across thesurface web, deep web, dark web, and anonymized networks.
Our Threat Research Team’s mission is aggressive: achieve near-total coverage of global breach and leak data with 99%+ automation . Your work directly enables HEROIC ’s ability to identify exposures before they are weaponized.
What You Will Do:
Automated Intelligence Collection & Discovery
Architect and operate large-scale, distributed crawling and discovery systems across:
Surface web, deep web, and dark web
Hacker forums, underground marketplaces, and breach communities
Chat platforms (Telegram, Discord, IRC, WhatsApp, etc.)
Paste sites, code repositories, and social platforms used for breach disclosure
Continuously discover, archive, and download newly released datasets, logs, credentials, and artifacts the moment they appear
Dark Web, Anonymized & Decentralized Networks
Build automated collectors and archivers for anonymized and decentralized networks including:
Tor (.onion), I2P, ZeroNet, Freenet, IPFS, GNUnet, Lokinet, Yggdrasil, and similar systems
Design resilient workflows for unreliable, adversarial, or ephemeral data sources
Normalize and index data from non-traditional network protocols and formats
Infrastructure & Exposure Discovery
Develop automated scanning systems to identify:
Unsecured databases (Elasticsearch, MySQL, PostgreSQL, MongoDB, etc.)
Exposed cloud storage (S3, Azure, GCP, DigitalOcean Spaces)
Open FTP servers, backups, and misconfigured archives
Monitor and ingest data from file hosting and distribution platforms commonly used for breach dumps
Pipeline Engineering & Operations
Build ETL pipelines to clean, normalize, enrich, and index structured and unstructured data
Implement advanced anti-bot evasion strategies (proxy rotation, fingerprinting, CAPTCHA mitigation, session management)
Integrate collected intelligence into centralized databases and search systems
Design APIs and internal tooling to support downstream analysis and AI/ML workflows
Implement advanced anti-bot, evasion, and resiliency techniques (proxy rotation, fingerprinting, CAPTCHA mitigation, session handling)
Automate deployment, scaling, and monit
Posted June 23, 2026