When using advanced storage strategy, why copy the peer data file into the same dir as the output file?
Per the documentation, when using the io.d7y.storage.v2.advance storage strategy, the peer data file is copied into the same directory as the output file.
After running dfget <url> -O /tmp/eddie_test, I have observed that there are actually 3 hard links to the file. They are:
-
/tmp/eddie_test -
/tmp/.eddie_test.dfget.cache.<req.PeerID> -
<dataDir>/<req.TaskID>/<req.PeerID>/data
Why not just copy the file to the dataDir, and link from there?
To avoid copying the daemon cache across filesystems to the specified directory.
Hardlink is fast than copying the file. io.d7y.storage.v2.simple storage strategy will copy the file.
Yep I definitely like the hard link approach. But I don't see why links need to exist in <dataDir> and in the output path.
If the <dataDir> is on a different filesystem, I believe we could use a symlink (and I see there is code already to do this).
So just download straight to <dataDir>, and then either hard link or symlink to the output path?
The strategy io.d7y.storage.v2.simple will make symlink if is on different filesystems
It appears that in https://github.com/dragonflyoss/Dragonfly2/blob/main/client/daemon/storage/storage_manager.go#L454 the symlink is done as a fallback if the hard link fails, when using io.d7y.storage.v2.advance.
I just updated my comment above, as my <data_dir> wasn't in backticks, so was being hidden.
@jim3ma @gaius-qi To clarify, what I'd like to know is: Why is it not sufficient to download the file to the dataDir, and then link (either hardlink or symlink) to it?
Why do we need /tmp/.eddie_test.dfget.cache.<req.PeerID>?