linkinator icon indicating copy to clipboard operation
linkinator copied to clipboard

Relative urls and redirects issue

Open marapper opened this issue 6 years ago • 2 comments

For example url https://www.sberbank.ru/ru/person/seizure redirected to https://www.sberbank.ru/seizure and have relative urls in there like ./1142.

If we crawl /seizure directly all this urls are OK. But when we start scanning with /ru/person/seizure all relative urls incorrect prefixed with before-redirected url like /ru/person/seizure/1142 and mark as broken.

marapper avatar Nov 15 '19 10:11 marapper

Also I think <base href=" tag don't taken into account when URL is buildng.

marapper avatar Nov 15 '19 10:11 marapper

Cannot be done without changes in gaxios (referenced PR). If real page URL will be in response this bug can be solved with changing opts.url to res.request.responseURL in index.js:149.

Also it can be another feature. Crawler result json can contains information about page links that are redirects. There are many cases when it can be usefull:

  • http links to sites that fully upgraded to https
  • links without www.
  • redirects can lead to not the same page than before
  • and others

marapper avatar Nov 15 '19 21:11 marapper