wallabagger Redownloads the page instead of using the content currently loaded

Wallabagger re-downloads the page instead of using the content currently loaded, making it unable to save paywalled articles.

It should probably parse the DOM instead of re-downloading, at least by default.

Dec 26 '20 19:12 andrewshadura

While I can see how this could be helpful, Wallabag already manages credentials for paywalls (https://doc.wallabag.org/en/user/articles/restricted.html). I know the list of compatible sites isn't big.

On a technical note I would guess this wouldn't be as much a Wallabagger issue as a Wallabag issue to store text and/or HTML instead of fetching an URL. I would guess that this would require a rewrite of part of the engine.

Mar 22 '21 21:03 pVesian

On the other hand, by re-fetching, wallabag is more able to bypass useless content, rather than a browser where you need countless clicks to get rid of those annoying GDPR screens/"subscribe to our newsletter" popups.

May 01 '21 15:05 hydrargyrum

Although the motivation is different, I think the technical implementation for this request would be similar to #105 (here the motivation is to avoid double download / to capture paywalled content while there the motivation is to clip a subsection of the content or save extra content like comments).

Dec 22 '21 18:12 wshanks