Dragonfly2 When using advanced storage strategy, why copy the peer data file into the same dir as the output file?

Per the documentation, when using the io.d7y.storage.v2.advance storage strategy, the peer data file is copied into the same directory as the output file.

After running dfget <url> -O /tmp/eddie_test, I have observed that there are actually 3 hard links to the file. They are:

/tmp/eddie_test
/tmp/.eddie_test.dfget.cache.<req.PeerID>
<dataDir>/<req.TaskID>/<req.PeerID>/data

Why not just copy the file to the dataDir, and link from there?

Sep 16 '22 18:09 embroede

To avoid copying the daemon cache across filesystems to the specified directory.

Sep 19 '22 04:09 gaius-qi

Hardlink is fast than copying the file. io.d7y.storage.v2.simple storage strategy will copy the file.

Sep 26 '22 01:09 jim3ma

Yep I definitely like the hard link approach. But I don't see why links need to exist in <dataDir> and in the output path.

If the <dataDir> is on a different filesystem, I believe we could use a symlink (and I see there is code already to do this).

So just download straight to <dataDir>, and then either hard link or symlink to the output path?

Sep 26 '22 21:09 embroede

The strategy io.d7y.storage.v2.simple will make symlink if is on different filesystems

Sep 27 '22 14:09 jim3ma

It appears that in https://github.com/dragonflyoss/Dragonfly2/blob/main/client/daemon/storage/storage_manager.go#L454 the symlink is done as a fallback if the hard link fails, when using io.d7y.storage.v2.advance.

Sep 27 '22 15:09 embroede

I just updated my comment above, as my <data_dir> wasn't in backticks, so was being hidden.

Oct 31 '22 19:10 embroede

@jim3ma @gaius-qi To clarify, what I'd like to know is: Why is it not sufficient to download the file to the dataDir, and then link (either hardlink or symlink) to it?

Why do we need /tmp/.eddie_test.dfget.cache.<req.PeerID>?

Mar 23 '23 04:03 embroede