PySitemap
PySitemap copied to clipboard
🕸️ Spider Sitemap - Simple Python 3 crawler that automatically navigates your website, discovers all pages, and generates a complete XML sitemap. Easy to configure and blazing fast!
Hello, somewhere there are restrictions on the length of urls, since urls longer than 300 for some reason are not included in the sitemap. It should be added that they...
https://github.com/Cartman720/PySitemap/blob/61c728f6b7824dcdace7ac61ca05cf267c52e23e/crawler.py#L26 Putting this under a try block should stop the script from crashing
I get this: ``` Parsing http://canterano.somenxavier.xyz/trobar-mesures-figures-vnps/figures-1.ggb Traceback (most recent call last): File "main.py", line 21, in links = crawler.start() File "/home/xan/Baixades/sitemap/PySitemap-master/crawler.py", line 17, in start self.crawl(self.url) File "/home/xan/Baixades/sitemap/PySitemap-master/crawler.py", line 49,...
Traceback (most recent call last): File "main.py", line 21, in links = crawler.start() File "\crawler.py", line 17, in start self.crawl(self.url) File "\crawler.py", line 26, in crawl response = urllib.request.urlopen(url) File...
``` Ampersand | & | & Single Quote | ' | ' Double Quote | " | " Greater Than | > | > Less Than | < | <...
In crawler.py, there's this code: ```python try: response = urllib.request.urlopen(url) except: print('404 error') return ``` However, 404 is not the only possible exception that can occur with urllib.request.urlopen(). Solution: ```python...