php-spider
php-spider copied to clipboard
A configurable and extensible PHP web spider
Hello @mvdbos I haven't found time to look into the robots.txt filter discussed in the other issue. Sorry! I stumbled on a new question you might be able to shine...
Hello @mvdbos, I hope you are doing well! I was wondering what your approach (if any) is to using the spider with robots.txt pattern for filtering? The `UriFilter` seems to...
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. Release notes Sourced from actions/checkout's releases. v4.0.0 What's Changed Update default runtime to node20 by @takost in actions/checkout#1436 Support fetching without the --progress option...
First of all, thanks for creating the php-spider script (almost) everything I need for my project is in it. _Is it possible to get the source of the spider where...
Hi - great work, I tried to crawl my own website and got the following errors (renamed the domain name) - interestingly other domains worked fine (e.g., example.com - although...
Hi. I'm wondering if it's possible to use the link checker example to just check for valid links, and maybe store them in a JSON, or CSV file instead of...
With this prefetch filter in place, skip fetching resources that are already downloaded and younger than max age. This requires that downloads are not segmented per spider id. A simple...
Add debug logging
Currently the `XPathExpressionDiscoverer` allows selectors ending with `/a` only. This means being more specific by using the [square-bracket notation](https://docs.oracle.com/javase/tutorial/jaxp/xslt/xpath.html) is not supported. However, this would make the spider so much...