Move metadata from info.json files to DB
Background
Currently, all metadata except tags are stored in the info.json files in each title folder. The data includes reading progress, sorting options, custom cover images, and custom display names. The reason why we use info.json can be found here https://github.com/hkalexling/Mango/issues/37#issuecomment-630291545.
The Issue
- Reading and writing to the
info.jsonfiles are much slower than a proper DB - Some data like tags and thumbnails can only be stored in DB, so when the library is renamed or moved, the data would be lost (see #146)
- Some users might want to keep their library unchanged
Proposed Solution
We can have two tables in DB:
==========
TITLES
----------
id
path
signature
==========
==========
TITLE_INFO
----------
id
tags
progress
... other info of the title
==========
In the TITLES table, signature is a (mostly) unique value for a title. We can calculate it using the following procedure:
- Get all entries in the title (a list of cbz/cbr files)
- Get the file sizes of the entries as an array and sort it
- Join the array as a long string
- Calculate the CRC32 checksum of the string, and use that as the signature of the title
If anything in the title changes, the checksum would likely change as well.
On library scan, if a title's path and signature match a row in the TITLES table, we assign the corresponding id to the title, and it can then retrieve its information from the TITLE_INFO table. If a title's signature matches the DB record, but the path doesn't (or the other way around), we still use the id, and we update the unmatched field to the correct value. In this way, even if a title is moved or renamed, we can still match it in the DB because its signature is still the same.
Conclusion
This issue serves as an RFC, so any comments and suggestions are welcome!
Sounds good to me.
Proposed fault tolerance of matching same titles:
- directories
- allow to be moved, renamed
- not allowing any updates (except renamed) of nested contents if moved, renamed
- files: allow to be moved, renamed
There might be other requirements, but I think this tolerance is enough to use, since people usually move or rename entire root titles. Above all, this prevent to generate thumbnails repeatedly! 😄
by the way, the calculated signatures are cached automatically?
@Leeingnyo Thanks for the feedback! I took some time to implement this (not pushed yet), and I am leaning towards simply using the inode numbers as the signatures for both titles and entries. On most file systems, the inode number of a file/folder is preserved when the file is moved, renamed, or even edited.
Some operations that would cause the inode number to change:
- Reboot/remount on some file systems
- Replaced with a copied file
- Moved to a different device
But since we are also comparing the file paths, we won't lose information as long as the above changes do not happen together with a file/folder rename, with no library scan in between.
The difference between using the inode number and the original plan mentioned above is that the inode number stays the same even when the file/folder content changes, but I think this is not an issue.
The inode number and filesize/modification date are metadata, and reading them is very fast, so I don't think we need to cache the signatures. I tested it a bit, and the scanning time does not appear to be much longer. But I am not sure how this would affect the scanning performance for network-mounted drives (see #118), so I would need to test this a bit before releasing the changes.
Again, feel free to let me know what you think!
oh I see! Then it has more generous fault tolerance. Great! You mean that signature of titles, entries equals inode number of directories, files (directly gotten from a single node, no nested jobs), right? not as wrote in dev branch
Oh I should have made it clearer that for titles we do generate the signatures recursively: https://github.com/hkalexling/Mango/blob/5779d225f6afece178aa5a8785f34045e84a4253/src/util/signature.cr#L10-L51
Update:
With the new metadata and library caching features in v0.24.0, Mango can handle large libraries pretty well, so we don't desperately need this feature any more. I am keeping this open so maybe we can revisit it someday.
Please have a look at #295
Yeah good point the JSON files are less resilient than the DB. Let me see what we can do.