spidy
spidy copied to clipboard
Optimizations, bug fixes, command line arguments, out of scope paths and domaisn, and headless browser support
Feature Description
- Optimizations: some general things, and implemented a hash set that prevents the same url from being visited multiple times, which frequently lead to infinite crawling.
- Bug fixes: potential issues with how the domain restrictions were being handled.
- Out of scope paths and domains: users can now enter domains and paths that are out of the scope of the scan (useful for pentests).
- Headless browser support: the ability to use a headless browser rather than just
requests.get()when making requests. This is more thorough, as dynamic content of the site is accessed due to the web page actually being rendered. This does lead to longer waits, but can be worth it depending on how the target site is put together. In the future, user-like interaction with the site can be implemented. This feature was implemented usingselenium.
Checklist
- [X] I wrote at least some documentation for this feature.
Checklist
- [X] This Pull will not add the same thing as another currently-open request.
- [X] Your Pull was made against the
rivermont:devbranch and notrivermont:master. - [X] This Pull does not commit any keys, passwords, personal data, or other private information.
- [X] I updated lines 20 and 21 in the README to reflect any changed I made.