Dragonfly2 icon indicating copy to clipboard operation
Dragonfly2 copied to clipboard

When using advanced storage strategy, why copy the peer data file into the same dir as the output file?

Open embroede opened this issue 3 years ago • 7 comments

Per the documentation, when using the io.d7y.storage.v2.advance storage strategy, the peer data file is copied into the same directory as the output file.

After running dfget <url> -O /tmp/eddie_test, I have observed that there are actually 3 hard links to the file. They are:

  • /tmp/eddie_test
  • /tmp/.eddie_test.dfget.cache.<req.PeerID>
  • <dataDir>/<req.TaskID>/<req.PeerID>/data

Why not just copy the file to the dataDir, and link from there?

embroede avatar Sep 16 '22 18:09 embroede

To avoid copying the daemon cache across filesystems to the specified directory.

gaius-qi avatar Sep 19 '22 04:09 gaius-qi

Hardlink is fast than copying the file. io.d7y.storage.v2.simple storage strategy will copy the file.

jim3ma avatar Sep 26 '22 01:09 jim3ma

Yep I definitely like the hard link approach. But I don't see why links need to exist in <dataDir> and in the output path.

If the <dataDir> is on a different filesystem, I believe we could use a symlink (and I see there is code already to do this).

So just download straight to <dataDir>, and then either hard link or symlink to the output path?

embroede avatar Sep 26 '22 21:09 embroede

The strategy io.d7y.storage.v2.simple will make symlink if is on different filesystems

jim3ma avatar Sep 27 '22 14:09 jim3ma

It appears that in https://github.com/dragonflyoss/Dragonfly2/blob/main/client/daemon/storage/storage_manager.go#L454 the symlink is done as a fallback if the hard link fails, when using io.d7y.storage.v2.advance.

embroede avatar Sep 27 '22 15:09 embroede

I just updated my comment above, as my <data_dir> wasn't in backticks, so was being hidden.

embroede avatar Oct 31 '22 19:10 embroede

@jim3ma @gaius-qi To clarify, what I'd like to know is: Why is it not sufficient to download the file to the dataDir, and then link (either hardlink or symlink) to it?

Why do we need /tmp/.eddie_test.dfget.cache.<req.PeerID>?

embroede avatar Mar 23 '23 04:03 embroede