php-spider icon indicating copy to clipboard operation
php-spider copied to clipboard

Allow square-bracket notation after anchor selector

Open netbrothers-tr opened this issue 1 year ago • 2 comments

Currently the XPathExpressionDiscoverer allows selectors ending with /a only. This means being more specific by using the square-bracket notation is not supported. However, this would make the spider so much more powerful and we wouldn't even have to change a lot.

An example of the square-bracket notation could be the following.

//a[starts-with(@href, '/') or starts-with(@href, '$url')]

To allow this, spider could either be less strict about the selector argument (maybe replacing endsWith with a regular expression) or move the validation of the selector argument away from the constructor (to a protected function maybe), such that when extending the XPathExpressionDiscoverer you could override such validation method and have your own selector validation.

netbrothers-tr avatar Dec 05 '24 12:12 netbrothers-tr

First of all, apologies for the late reaction. My day job has been busy.

The power of the spider is that it can use any Discoverer, and you can implement custom ones too. The examples show how to set the one you want to use.

Your suggestion sounds like a great idea. Would you be open to create an implementation for the improved XPathExpressionDiscoverer? I would be happy to adopt it as the new one. We could rename the current one to SimpleXPathExpressionDiscoverer. example_simple.php could then keep using the simple discoverer and example_complex could use the new one with bracket notation.

mvdbos avatar Feb 15 '25 22:02 mvdbos

@mvdbos, same here, I must have missed the GitHub notification on this one. 😅 So, apologies, for being quiet this long. This sounds like a great idea and I'm happy to create a PR for that.

netbrothers-tr avatar Nov 17 '25 10:11 netbrothers-tr