Add support for hard links.
rustic currently saves the file # of links, as restic does. Other support for hard links is missing.
Restic uses the information of # of links and more (like device id) to (try to) restore hard links during restore. This so far only works for files within the restore and poses other problems, see https://github.com/restic/restic/issues/3041#issuecomment-954577982.
Correct treatment of hard links during backup and restore is hard, but should be better supported. For instance:
- Save more/better information in the metadata during
backup. - Provide various options in
restoreto suit user's needs for restoring hard links.
Proper full snapshot restore + preserve hardlinks is important, otherwise restored data may suddenly take up MUCH MORE space than expected.
Also it's an optimization: you don't need to download/create/write file again, just create hardlink to existing one (if it's in the same snapshot).
I tried to dig a bit into this topic. Indeed it seems that the only way to detect hard links within a filesystem is to check the number of links and compare inodes - exactly what restic is doing within their restore.
Now there are the following issues:
- a snapshot can contain multiple file systems, restic therefore uses also the device id to identify corresponding hard links
- also, we might restore to different file systems. The "file system layout" might be especially different to the one we encountered during backup which makes it very difficult to decide what to do in edge cases. E.g. hard links on one filesystem should be restored to two file systems: What do we link? Should we fall back to soft links, if possible?
- the device id on the other hand can change and therefore is generally not a good thing to save in a backup. rustic has to option
--ignore-devid. - however, with
--ignore-devidwe might have two files with multiple links and the same inode number which were on different file systems and therefor not hard linked to each other. We therefore should also check for identical file contents, but this still doesn't help to exactly restore file system contents - identical files could exist on two filesystems while still having hard links on each filesystem. - So, we have to accept that we may be not 100% precise during restore. However, the question is, if this is really a serious problem. Usually we would restore to the same filesystem structure and if not, shouldn't care too much if hard links are exactly restored - as long as each restored file maps to the correct contents.
So, we have to accept that we may be not 100% precise during restore. However, the question is, if this is really a serious problem. Usually we would restore to the same filesystem structure and if not, shouldn't care too much if hard links are exactly restored - as long as each restored file maps to the correct contents.
In some cases restoring hardlinks is important, e.g. there is a lot of large hardlinked files, but device space is limited. Real live example: macos Time Machine
There is a simple solution to make everyone happy: add an optional flag to restore hardlinks only for those users, who understand what they are doing. Or pair of flags, which only work together.
Something like: --force-restore-hardlinks --i-understand-that-hardlinks-may-be-not-precise-on-different-fs.
Hi @aawsome ! Any updates on this? Maybe just implement one simple, most likely way to restore, and notify on any errors? Seems like filename edge cases, where it's too expensive to account for all the edge cases.
We can assume device ids are stable within a single snapshot, right?! To make sure, we can check device ids haven't changed for files where node.links >= 2 before writing the snapshot and if they've changed, store "unreliable_devids" in snapshot metadata.
If --ignore-devid && node.type == "file" && node.links >= 2, node.link_devid = <device_id>. We only use node.link_devid for restoring hard links.
Options for default behaviour:
- only restore hardlinks if node.devid or node.link_devid is present and snapshot.unreliable_devids == 0;
- restore hardlinks if node.devid or node.link_devid are present, warn if snapshot.unreliable_devids is not present, report/log restored hardlinks.
For backups created without node.devid, node.link_devid, snapshot.unreliable_devids, we could provide an option to guess hardlinks during restore (--guess-hardlinks) when files are in the same snapshot and have identical node properties for mode, mtime, uid, gid, inode, size, links, content.
When restoring hardlinks across filesystems:
- --hardlinks-fallback-duplicate (default);
- --hardlinks-fallback-symlink;
- report/log hardlink and fallback methed used when it happens.