huntsman
huntsman copied to clipboard
Super configurable async web spider
Hi, Your project is really interesting! I was wondering if it was possible to make it follow link within a specific container in the first loaded page? Or manually select...
cli emits the following message "child_process: customFds option is deprecated, use stdio instead."
Obey robots.txt. Minimum functonality: **cancel all requests which globally disallow the huntsman `User-Agent`** ``` text User-agent: huntsman Disallow: / ``` **cancel all requests for urls which match `Disallow` statements** ```...
- Improve timeout detection, if no urls are queued and no connections are open, exit immediately. - Stats update - Show how many connections are currently open (waiting for the...
- Record HTTP re-directs - Consider whether the original or the redirected uri should be set for `res.uri` - Do regular expressions match against original or redirect uri?