pylinkvalidator
pylinkvalidator copied to clipboard
pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.
Trying to run: `pylinkvalidate.py -P "https://eai.company/"` or `pylinkvalidate.py -P "https://enlightenment.ai/"` results in a: `The URL must not be empty: https://eai.company/` or `The URL must not be empty: https://enlightenment.ai/` I'm not...
Add ability to exclude a URL from being checked (regex) #7
Simple fix to resolve the newline encoded at the end of the basic auth string. Symptom: ``` ERROR Crawled 1 urls with 1 error(s) in 0.01 seconds Start URL(s): http://foobar.foo...
I had to manually delete Python2 code containing `print` statements, afterwards `python3 setup.py install` worked for me. Would you accept a PR to remove Python 2 compatibility (Python 2 is...
I crawled a small site and It did not crawl external urls.
I am crawling this website to find all the pages that 404, But the website i am crawling have the 404's redirected to a pretty 'sorry for 404' page(302). So...
Code: `import bs4` `import pylinkvalidator.api from pylinkvalidator.api import crawl_with_options as crawl_opts` `crawled_site = crawl_opts(["https://mysite.net/"], {"run-once": True, "progress": True, "console": True, "show-source": True, "allow-insecure-content": True, "parser": "lxml"})` returns (with IPython 3.7):...
links beginning with "tel:" should be skipped source file: ``` + 33 (0)1 11 11 11 11 ``` output ``` not found (404): http://localhost:8000/tel:0033111111111 from http://localhost:8000/ ``` I didn't try...
Is there any way to check the resources specified as relative links in the page? Thanks!
In "Usage Examples" in readme.rst, it tells "--parser=LXML" but it works for me only if I use "parser=lxml"