hydrus icon indicating copy to clipboard operation
hydrus copied to clipboard

system:lossy/lossless encoding (for duplicates filtering)

Open pozieuto opened this issue 1 year ago • 0 comments

Hydrus has what I would assume is a hardcoded system for preferring the JPEG when it has a pixel-for-pixel duplicate PNG. This is correct and very useful behaviour. However, it is limited to just that specific case. I am encountering real-world situations where I find PNGs that are pixel-for-pixel duplicates of lossy WebPs. It is also functionally a regression to losslessly convert JPEGs to JXL and then have Hydrus no longer acknowledge that these JXLs are superior to pixel-for-pixel duplicate PNGs when it would have done this with the JPEGs. As image formats beyond JPEG and PNG continue to see more usage, these situations will only grow more common, so Hydrus's coverage of these situations should be expanded.

The situation with JPEGs and PNGs is very straightforward, as they are exclusively lossy and lossless compression formats respectively, but most other graphics formats support both lossy and lossless modes. Hydrus's support only has a few other exceptions which only support lossless compression, including bitmap, icon, QOI, and GIF. (It seems silly to think of GIF as lossless given how crappy most GIFs look due to the colour limitations, but it is technically the case. I have also seen "lossy" PNG optimizers before, but that's a whole other mess.) So there needs to be a way to search by the actual encoding method, and not just the image format.

One option for implementation would be expanding system:filetype, but adding separate checkboxes for lossy and lossless versions of each applicable image format seems very cumbersome. Instead, I would suggest expanding the "file properties" to include new system:lossy encoding and system:lossless encoding options.

This could be expanded to include audio/video formats in theory, some of which support lossy and lossless codecs in the same container, but given that Hydrus doesn't have much built-in support for resolving duplicates of those types in the first place, this doesn't seem nearly as important.

This would be very relevant to the work currently being done on the duplicates auto-resolution system.

pozieuto avatar Mar 14 '25 00:03 pozieuto