GChan icon indicating copy to clipboard operation
GChan copied to clipboard

Network usage

Open Wilaz opened this issue 1 year ago • 13 comments

I'm not sure what causes this, but GChan has been using 80+ Mbps of bandwidth while sitting idle doing nothing. attached are the logs GChan.log

Wilaz avatar Sep 14 '24 01:09 Wilaz

Hi @Wilaz the GChan program has been plagued recently with errors. The high use of bandwith you are seeing comes from requests being sent way too fast and getting repeatedly blocked. These errors come from new rate limiting implemented on the 4chan servers that kick in when too many requests are sent. I myself was IP banned from 4chan and even if I had time to fix the issues I would not be able to test them. I've finally been unbanned and I have time now to update the program to try and play nicely with these new restrictions.

I'm letting you all know I have this work in progress, as you can see on GitHub or follow in t he Discord in #⁠github-alerts. I write this program in my spare time, which is very sparse now compared to when I started the program. Any donations are very much appreciated, no matter the size. As gratitude for the work done, and motivation for the work to come. Thank you for listening and sticking around.

Donation options: https://github.com/sponsors/Issung https://www.paypal.com/paypalme/Issung https://ko-fi.com/issung

Issung avatar Oct 17 '24 08:10 Issung

Do you accept Monero (XMR)?

Wilaz avatar Oct 17 '24 15:10 Wilaz

@Wilaz Yes, thank you :) 474nPEA2498X3Lad1LFfVPJBEhs1USog77UK5AHUWed67CmHd8TCnjM2j9BvYUPT7ZTTfQJgHmsZY7uoxYrG7AKiFDXDVWS

Issung avatar Oct 25 '24 01:10 Issung

Hi @Issung This commit fixes the issues with bandwidth and thread html not being downloaded: Fix 429 / Too many requests Cookies or some other state on the web client triggers Cloudflare's DoS protection, so the change is to create a new instance for every request.

Also, unrelated to the issue, all URLs are now https. Setting that instead of http would avoid a redirect (which is handled internally by the web client).

vlad-patras avatar Oct 28 '24 21:10 vlad-patras

@vlad-patras Hi mate, thanks for the heads up.

A big branch is currently in development here to entirely overhaul the rate limiting to obey the 4chan suggestions, switch from WebClient to HttpClient and perform everything with async.

I will use Fiddler to investigate the difference in behaviour between WebClient and HttpClient, and I'll also verify what you're saying about http/https. The documentation says to use whichever you like, but maybe it's out of date.

Issung avatar Oct 29 '24 08:10 Issung

You're right about the HTTPs move, thank you. This will help :) image

Issung avatar Oct 29 '24 09:10 Issung

I just tried the branch, no issues so far. It doesn't look like it loads the database from the previous version, but I just copied over with the copy to clipboard feature.

vlad-patras avatar Nov 21 '24 09:11 vlad-patras

After using the branch a bit more I seem to have found two issues, both have to do with saving the threads and resuming later.

Issue 1: When starting GChan with saved threads, if there was a change to a thread, all assets will be downloaded again. This seems to be because SeenAssetIds in Thread.cs is not loaded from the database (even though the field comment says it should be). The application will start with an empty set and upon scraping the thread it will see every result as a new asset.

Issue 2: If downloads are in queue for a thread and the application is closed, it will not resume download after opening (unless the thread has updates). This seems to be caused by the new functionality to avoid scraping if the thread was not updated. When a thread has no changes (detected through IfModifiedSince header) the 4chan server will respond with 302 Not Modified. This is generally OK since it avoids unnecessary processing. However when a thread never gets further updates (ex. due to reaching post limit), it will result lost files.

vlad-patras avatar Nov 21 '24 13:11 vlad-patras

These changes should fix issues above: Fix resuming download after app close and re-open

SeenAssetIds is initialized with saved IDs from DB, which I assume was intended given the field comment. This avoids re-downloads since there's already a check not to download seen ids.

IfModifiedSince is disabled if previous known file count is greater than seen ids count. This results in the thread being scraped again and new assets to be queued.

vlad-patras avatar Nov 21 '24 22:11 vlad-patras

@vlad-patras I love how familiar you're getting with the code, that you can spot problems I've missed! This is the exact type of thing I was hoping to get out of releasing early beta versions. I'll get that fixed soon as I get a moment :)

Issung avatar Nov 22 '24 10:11 Issung

@vlad-patras New release with your fixes here: https://github.com/Issung/GChan/releases/tag/v6.3-beta :)

Issung avatar Nov 29 '24 05:11 Issung

@Issung Thanks! It's great having a project still maintained to keep up with all the 4chan changes.

I got the latest beta changes and looks to be working fine.

There is another small issue I encountered. When the subject can't be determined, even with the new rules, sanitization fails with a null reference. These threads are quite rare, but here is an example: vm/thread/1516424

2024-11-27 23:34:10.7624 [Error] System.NullReferenceException: Object reference not set to an instance of an object. ...\Thread.cs:126 System.NullReferenceException: Object reference not set to an instance of an object. at GChan.Utils.RemoveCharactersFromString(String input, Char[] chars) in ....\Utils.cs:line 231

I just added a null check for me so it starts downloading. Maybe the first comment can be trimmed instead of ignored when too long. But the "No Subject" doesn't bother me.

vlad-patras avatar Nov 29 '24 12:11 vlad-patras

Too easy, new release with the fix: https://github.com/Issung/GChan/releases/tag/v6.3.1-beta

Issung avatar Nov 29 '24 13:11 Issung