Mango icon indicating copy to clipboard operation
Mango copied to clipboard

Move metadata from info.json files to DB

Open hkalexling opened this issue 5 years ago • 7 comments

Background

Currently, all metadata except tags are stored in the info.json files in each title folder. The data includes reading progress, sorting options, custom cover images, and custom display names. The reason why we use info.json can be found here https://github.com/hkalexling/Mango/issues/37#issuecomment-630291545.

The Issue

  1. Reading and writing to the info.json files are much slower than a proper DB
  2. Some data like tags and thumbnails can only be stored in DB, so when the library is renamed or moved, the data would be lost (see #146)
  3. Some users might want to keep their library unchanged

Proposed Solution

We can have two tables in DB:

==========
TITLES
----------
id
path
signature
==========
==========
TITLE_INFO
----------
id
tags
progress
... other info of the title
==========

In the TITLES table, signature is a (mostly) unique value for a title. We can calculate it using the following procedure:

  1. Get all entries in the title (a list of cbz/cbr files)
  2. Get the file sizes of the entries as an array and sort it
  3. Join the array as a long string
  4. Calculate the CRC32 checksum of the string, and use that as the signature of the title

If anything in the title changes, the checksum would likely change as well.

On library scan, if a title's path and signature match a row in the TITLES table, we assign the corresponding id to the title, and it can then retrieve its information from the TITLE_INFO table. If a title's signature matches the DB record, but the path doesn't (or the other way around), we still use the id, and we update the unmatched field to the correct value. In this way, even if a title is moved or renamed, we can still match it in the DB because its signature is still the same.

Conclusion

This issue serves as an RFC, so any comments and suggestions are welcome!

hkalexling avatar Jan 14 '21 06:01 hkalexling

Sounds good to me.

Proposed fault tolerance of matching same titles:

  • directories
    • allow to be moved, renamed
    • not allowing any updates (except renamed) of nested contents if moved, renamed
  • files: allow to be moved, renamed

There might be other requirements, but I think this tolerance is enough to use, since people usually move or rename entire root titles. Above all, this prevent to generate thumbnails repeatedly! 😄

by the way, the calculated signatures are cached automatically?

Leeingnyo avatar Jan 25 '21 22:01 Leeingnyo

@Leeingnyo Thanks for the feedback! I took some time to implement this (not pushed yet), and I am leaning towards simply using the inode numbers as the signatures for both titles and entries. On most file systems, the inode number of a file/folder is preserved when the file is moved, renamed, or even edited.

Some operations that would cause the inode number to change:

  • Reboot/remount on some file systems
  • Replaced with a copied file
  • Moved to a different device

But since we are also comparing the file paths, we won't lose information as long as the above changes do not happen together with a file/folder rename, with no library scan in between.

The difference between using the inode number and the original plan mentioned above is that the inode number stays the same even when the file/folder content changes, but I think this is not an issue.

The inode number and filesize/modification date are metadata, and reading them is very fast, so I don't think we need to cache the signatures. I tested it a bit, and the scanning time does not appear to be much longer. But I am not sure how this would affect the scanning performance for network-mounted drives (see #118), so I would need to test this a bit before releasing the changes.

Again, feel free to let me know what you think!

hkalexling avatar Jan 26 '21 12:01 hkalexling

oh I see! Then it has more generous fault tolerance. Great! You mean that signature of titles, entries equals inode number of directories, files (directly gotten from a single node, no nested jobs), right? not as wrote in dev branch

Leeingnyo avatar Jan 26 '21 16:01 Leeingnyo

Oh I should have made it clearer that for titles we do generate the signatures recursively: https://github.com/hkalexling/Mango/blob/5779d225f6afece178aa5a8785f34045e84a4253/src/util/signature.cr#L10-L51

hkalexling avatar Jan 27 '21 03:01 hkalexling

Update:

With the new metadata and library caching features in v0.24.0, Mango can handle large libraries pretty well, so we don't desperately need this feature any more. I am keeping this open so maybe we can revisit it someday.

hkalexling avatar Mar 19 '22 13:03 hkalexling

Please have a look at #295

afknst avatar Apr 18 '22 14:04 afknst

Yeah good point the JSON files are less resilient than the DB. Let me see what we can do.

hkalexling avatar Apr 22 '22 08:04 hkalexling