ShokoServer icon indicating copy to clipboard operation
ShokoServer copied to clipboard

Unrecognized files extended handeling (regex crc)

Open bigretromike opened this issue 7 years ago • 8 comments

From user perspective this could be in Client, but it should be done in server anyway.

When there is a unrecognized file, from time to time shoko calculate wrong crc - why ? I don't have idea (timeout? locked resources? space radiation).

We could add a simple regex function that look for calculated crc sum in file name (common standard) and if there is crc there and its not valid we could mark that as 'to rehash' or even rehash it. Having additional flag in db that would hold information about this, that way we:

  1. rehash file that had wrong crc sum and help server to recognize file
  2. if rehash is missing from file name we do nothing (nothing is there to be done)
  3. if new hash is different from hash in file name and this is n-time (1,2,3?) this is happening we know that the user have corrupted file and shoko won't be able to recognize this file

Additional it would be great if we had a new type of directory Bad Files or something like that that the corrupted files would be move to - this will result in less duplicate request to both shoko and anidb, resolve issue with not linking file for some people and help to had healthy corruptless files (ex. when someone download corrupted series add it to anidb and result in linking it even if the file name crc dont match the one that he got)

bigretromike avatar Sep 18 '18 11:09 bigretromike

That could work after hashing and tried to recognize the file with anidb, since there is already cases in anidb that the crc32 of the file do not match the crc32 in the file name, they have a flag for it.

El El mar, 18 de set. de 2018 a las 08:13, BigRetroMike < [email protected]> escribió:

From user perspective this could be in Client, but it should be done in server anyway.

When there is a unrecognized file, from time to time shoko calculate wrong crc - why ? I don't have idea (timeout? locked resources? space radiation).

We could add a simple regex function that look for calculated crc sum in file name (common standard) and if there is crc there and its not valid we could mark that as 'to rehash' or even rehash it. Having additional flag in db that would hold information about this, that way we:

  1. rehash file that had wrong crc sum and help server to recognize file
  2. if rehash is missing from file name we do nothing (nothing is there to be done)
  3. if new hash is different from hash in file name and this is n-time (1,2,3?) this is happening we know that the user have corrupted file and shoko won't be able to recognize this file

Additional it would be great if we had a new type of directory Bad Files or something like that that the corrupted files would be move to - this will result in less duplicate request to both shoko and anidb, resolve issue with not linking file for some people and help to had healthy corruptless files (ex. when someone download corrupted series add it to anidb and result in linking it even if the file name crc dont match the one that he got)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ShokoAnime/ShokoServer/issues/756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHDbfUu5BSwauddfiwROrSvl34WQvJUks5ucNU8gaJpZM4WtsUi .

maxpiva avatar Sep 18 '18 15:09 maxpiva

CRC32 in filename is not as common as it once was. Looking at the last 12 series I've downloaded, only 1 had CRC in the name.

Maybe your files are corrupted? I've never had Shoko calculate the wrong CRC32 value unless the file was corrupted.

ElementalCrisis avatar Sep 19 '18 05:09 ElementalCrisis

It's not that rare on files that come out weekly

da3dsoul avatar Sep 19 '18 05:09 da3dsoul

@ElementalCrisis no, the file wasn't corrupted, because when I hit rehash button it calculated different (now correct) check sum. Also Im not talking about HorribleSubs, common standard is for fansub groups that dont rip content from commercial sites. But this is handle by if rehash is missing from file name we do nothing (nothing is there to be done). This operation don't cost us anything only 1bit of data in db or even reuse something else for this purpose, also this don't stress collection files more that we did already.

bigretromike avatar Sep 19 '18 05:09 bigretromike

out of the last 100 files added to Shoko, recognized or not, 27 of them have a CRC in the name. 55 of those are HorribleSubs, so a little over half of the remaining do

da3dsoul avatar Sep 19 '18 05:09 da3dsoul

Also by doing this we could contribute to use it like most people did before. Some people use renaming tools and showing them that including [crc] would benefit them even more, because in case of reimporting collections it also check integrity and help spread awareness, imo.

bigretromike avatar Sep 19 '18 06:09 bigretromike

That's weird, maybe an issue with the new hasher we switched when adding Linux support?

ElementalCrisis avatar Sep 19 '18 06:09 ElementalCrisis

can't tell for sure because I usualy throw files in drop and leave them by. And those files are on NAS. It could be problem with hasher (maybe when cpu is on 100% it make errors or share memory... didn't test that so I dont know). But this ain't about if hasher is broken ;-) When new shoko will be usable I can code this without issue if this is the case. Just wanted write this down.

PS. Also we are not sure if its new hasher, old hasher or fallback c# hasher that made those checksums.

bigretromike avatar Sep 19 '18 06:09 bigretromike

I want to say this was fixed but can someone confirm?

ElementalCrisis avatar Dec 31 '23 09:12 ElementalCrisis

are we just going to ignore the groups that do "silent updates" to their batch releases without changing the file names? that would result in an infinite loop of rehashes if we don't have a limitor on it, and if we have a limitor then it needs to be smart enough to know it has tried an automatic rehash to fix it once so it won't happen periodically during e.g. an import, etc.

revam avatar Dec 31 '23 16:12 revam

I'm gonna close the issue for now as I haven't seen this reported since and the effort to implement such a system is probably more than its worth at the moment.

ElementalCrisis avatar Dec 31 '23 23:12 ElementalCrisis