Feature request - crawl sitemap.xml
We are finding sitediff works quite well, however it may not find all the URLS on a site by links from the home page. We would like (optionally) to be able to add the URLS listed in the sitemap.xml to the paths.
This is a good idea. We haven't been actively updating the project lately, but let's keep this one open.
I believe you should be able to add paths manually to the yaml file, after an initial crawl . Cleaver is that not still the case?
On Wed, Mar 10, 2021 at 10:24 AM Cleaver Barnes @.***> wrote:
This is a good idea. We haven't been actively updating the project lately, but let's keep this one open.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/evolvingweb/sitediff/issues/124#issuecomment-795599008, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBFYFWM62DS6M4JCSGTX3TC56J5ANCNFSM4Y5GVSTA .
--
Alex Dergachev Evolving Web Lead developer web design & development @.*** http://evolvingweb.ca phone 514.844.4930 300 St Sacrement, #204 fax 514.807.7499 Montreal, QC, H2Y 1X4
Yes, you could manually reformat the paths from sitemap.xml into paths.txt.
Yes adding to the paths.txt by parsing a json sitemap is the approach we have been taking. However it is currently semi automated, and I would presume that a good automation using XML instead would be useful to other people as well. :)