sitediff icon indicating copy to clipboard operation
sitediff copied to clipboard

Feature request - crawl sitemap.xml

Open Mark-Hetherington opened this issue 4 years ago • 4 comments

We are finding sitediff works quite well, however it may not find all the URLS on a site by links from the home page. We would like (optionally) to be able to add the URLS listed in the sitemap.xml to the paths.

Mark-Hetherington avatar Mar 10 '21 05:03 Mark-Hetherington

This is a good idea. We haven't been actively updating the project lately, but let's keep this one open.

cleaver avatar Mar 10 '21 15:03 cleaver

I believe you should be able to add paths manually to the yaml file, after an initial crawl . Cleaver is that not still the case?

On Wed, Mar 10, 2021 at 10:24 AM Cleaver Barnes @.***> wrote:

This is a good idea. We haven't been actively updating the project lately, but let's keep this one open.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/evolvingweb/sitediff/issues/124#issuecomment-795599008, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBFYFWM62DS6M4JCSGTX3TC56J5ANCNFSM4Y5GVSTA .

--

Alex Dergachev Evolving Web Lead developer web design & development @.*** http://evolvingweb.ca phone 514.844.4930 300 St Sacrement, #204 fax 514.807.7499 Montreal, QC, H2Y 1X4

dergachev avatar Mar 13 '21 15:03 dergachev

Yes, you could manually reformat the paths from sitemap.xml into paths.txt.

cleaver avatar Mar 13 '21 20:03 cleaver

Yes adding to the paths.txt by parsing a json sitemap is the approach we have been taking. However it is currently semi automated, and I would presume that a good automation using XML instead would be useful to other people as well. :)

Mark-Hetherington avatar Mar 14 '21 05:03 Mark-Hetherington