Scrapegraph-ai
Scrapegraph-ai copied to clipboard
fix: Augment the information getting fetched from a webpage
These are follow-up changes from the discussion https://github.com/VinciGit00/Scrapegraph-ai/issues/187
We are now adding a mechanism to fetch the contents of the webpage using beautifulsoup. Apart from the header and body are now also fetching all the urls on the webpage.
We will need some work to create a navigable URLs from the current ones as sometimes they are just pointing to sub-pages within the webside (see the example below)
This getting the navigable url and cleaning up the relevant urls will be taken up in a separate change