goscrape
goscrape copied to clipboard
Web scraper that can create an offline readable version of a website
URL: Log: ``` 2021-06-29T06:10:25.020Z INFO External URL {"URL": "data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs%3D"} 2021-06-29T06:10:25.025Z INFO Downloading {"URL": "data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs%3D"} 2021-06-29T06:10:25.031Z ERROR Scraping failed {"error": "Get \"data:image/gif;base64,R0lGODlhAQABAAD/ACwAAAAAAQABAAACADs%3D\": unsupported protocol scheme \"data\""} ```
Example site :- origami.guide This tool is unable to get all images of this site and similar sites.
`example.org/cdn-cgi/styles/fonts/opensans-600.svg#open_sanssemibold`
For example: ``` https://www.example.com/category/blog-post/ https://www.example.com/category/blog-post ```
this will fix some problems like images referenced in css
* update goreleaser action to latest (v5) * add docker build to goreleaser config * add docker login step to release workflow to authenticate to GHCR * add Dockerfile
Simple scrape started, after about 55 minutes, crash with OOM from Linux kernel. Using 3.5GB, on a 4GB machine. My guess, this is because entire "queue" of what to download,...