duc icon indicating copy to clipboard operation
duc copied to clipboard

follow symlinks for duc index ?

Open j4jes opened this issue 5 years ago • 11 comments

Is there an option for duc to actually follow symlinks when indexing? In some other programs like bacula I can append a /. to the path and it will follow symbolic links.

My symlinks are pointing to NFS mounts on other nodes and rather than run duc index on all hosts I'd like it it to index from the head node.

Thanks.

j4jes avatar Feb 22 '20 23:02 j4jes

Not at this time, as it kind of clashes with Duc's primary goal under normal circumstances. This should no be too hard to add, although I'd rather make it explicit wih a --long-option instead of depending on special path notation.

zevv avatar Feb 23 '20 05:02 zevv

Thanks. When possible, it would make the duc tool extremely valuable for me.

j4jes avatar Feb 24 '20 15:02 j4jes

Quoting Jesse of the North (2020-02-24 16:01:34)

Thanks. When possible, it would make the duc tool extremely valuable for me.

Some things will get hairy though, I'm not yet sure how to handle possible loops arising from symbolic links, My hunch is to do nothing to prevent this and just provide a big fat warning that if you enable symlink following the results might be bugus and Duc might get stuck in a loop forever and eat all your disk...

-- :wq ^X^Cy^K^X^C^C^C^C

zevv avatar Feb 24 '20 15:02 zevv

Maybe it's not a feature you want in the full release then. In my case it works since there shouldn't be secondary links in the tree, but perhaps a good protection measure would be to limit the number of symlinks to follow (if possible)

j4jes avatar Feb 24 '20 16:02 j4jes

Quoting Jesse of the North (2020-02-24 17:09:46)

Maybe it's not a feature you want in the full release then.

Hmm yes, but there is only one build type, so there is no such thing as a "full release" I'm afraid

In my case it works since there shouldn't be secondary links in the tree, but perhaps a good protection measure would be to limit the number of symlinks to follow (if possible)

That's not trivial, then each subtree should keep track of the symlink depth it has followed, and that still does not guarentee that following symlinks can create a huge mess by counting directories more then once.

I'll have to give this another think...

-- :wq ^X^Cy^K^X^C^C^C^C

zevv avatar Feb 24 '20 17:02 zevv

"Ico" == Ico Doornekamp [email protected] writes:

Ico> Quoting Jesse of the North (2020-02-24 16:01:34)

Thanks. When possible, it would make the duc tool extremely valuable for me.

Ico> Some things will get hairy though, I'm not yet sure how to handle Ico> possible loops arising from symbolic links, My hunch is to do Ico> nothing to prevent this and just provide a big fat warning that Ico> if you enable symlink following the results might be bugus and Ico> Duc might get stuck in a loop forever and eat all your disk...

Why does this make things simpler is what I ask? And we already have to handle sym-link loops inside a directory/disk tree already, so it shouldn't be any worse.

I agree that this shouldn't be the default though, because it could lead to problems.

Jesse,

Can you give a more detailed example of what you're trying to accomplish here with this suggestion? Don't need your specific details, but I'd like a more concrete proposal so we can examine it more.

For example, if I have three seperate filesystems on my system:

/
/foo
/bar

and I index /, duc will happily descend into all of them.

If you're looking to do something more like:

/data/foo /data/bar /data/sah /data/sah/foo -> /data/foo /data/sah/john -> /data/bar/john

and you want to index /data/sah and have it also follow the symlinks above where the tree is rooted and index them so it looks like:

/data/sah /data/sah/foo /data/sah/john

and they're all one listed filesystem... then maybe kinda sorta. Not sure I like this because the symlink doesn't really make it look like it's all one tree under 'sah', but you jump to the new volumes when you do a 'cd /data/foo/sah' and you end up in the new volume.

John

l8gravely avatar Feb 26 '20 03:02 l8gravely

"Ico" == Ico Doornekamp [email protected] writes:

Ico> Quoting Jesse of the North (2020-02-24 17:09:46)

Maybe it's not a feature you want in the full release then.

Ico> Hmm yes, but there is only one build type, so there is no such thing as Ico> a "full release" I'm afraid

In my case it works since there shouldn't be secondary links in the tree, but perhaps a good protection measure would be to limit the number of symlinks to follow (if possible)

Ico> That's not trivial, then each subtree should keep track of the Ico> symlink depth it has followed, and that still does not guarentee Ico> that following symlinks can create a huge mess by counting Ico> directories more then once.

Ico> I'll have to give this another think...

I wonder how we handle this now? And if we have test(s) in place to detect whether get this right, or even consistent. It's certainly something to think about. Actually, we don't follow symlinks now, so it's a moot point.

But adding an option to follow... if we have:

/path/to/some/things /path/to/some/other/stuff /path/to/some/data/here/link -> ../../other

How would you track and display it? If we index it with:

duc index -d /tmp/path.db /path/to/some

it would look screwy to have that relative symlink show up and count twice, since it would be a lie:

/path/to/some/things/... /path/to/some/other/stuff/... /path/to/some/data/data/... /path/to/some/data/here/link/stuff/...

that way lies madness! I don't think we should support this option at all, at least as I think it would be done from my musings above. It's really not the truth in terms of disk space used, it's just a logical way to either share data (for both soft and hard links) so it should be just counted once in any case.

Now looking at 'du' it does have the option -L to dereference all symbolic links. But the docs are quite bare on how to handle loops and other craziness. And I'm too tired to spend time testing it.

I still think it's a bad idea overall.

John

l8gravely avatar Feb 26 '20 03:02 l8gravely

it's a special user case scenario for me, so if it looks hairy don't bother. I have a cluster with several nodes mounted with NFSoRDMA as /n2, /n3, /n4, /n5 etc .. but then for simplicity the users don't see those mounted folders, they use /share/projects that contain symlinks to those NFSoRDMA mounts. It works well because I can manage which nodes get which projects by directing symlinks to available space on nodes, but I would like /share/projects to get a nightly duc index too. I kind of want duc index to be oblivious to the symlinks in /share/projects but now that I know what kind of pain that would involve I'm opting for something else. I think I will just duc index the /n2, /n3, ... folders individually even though it won't look as pretty.

j4jes avatar Feb 26 '20 04:02 j4jes

Jesse> it's a special user case scenario for me, so if it looks hairy Jesse> don't bother. I have a cluster with several nodes mounted with Jesse> NFSoRDMA as /n2, /n3, /n4, /n5 etc .. but then for simplicity Jesse> the users don't see those mounted folders, they use Jesse> /share/projects that contain symlinks to those NFSoRDMA Jesse> mounts. It works well because I can manage which nodes get Jesse> which projects by directing symlinks to available space on Jesse> nodes, but I would like /share/projects to get a nightly duc Jesse> index too. I kind of want duc index to be oblivious to the Jesse> symlinks in /share/projects but now that I know what kind of Jesse> pain that would involve I'm opting for something else. I think Jesse> I will just duc index the /n2, / n3, ... folders individually Jesse> even though it won't look as pretty.

Maybe a work around that would work better would be to use automounts instead of symlinks to build up your entries under /share/projects instead? Then when you did a 'duc index /share/projects' it would do the right thing.

I assume user's can just goto /n2/some/path/... on their own as well?

In any case, I can now see how your use case make more sense, but I still think it's not ideal. In my $WORK, I have a bunch of /data/<volume/ NFS mounts from various Netapps, so I just index them all seperately, since each volume is seperate, and since they have symlinks pointing all over the place to other volumes, directories, etc. It's a nightmare to keep track of. But just keeping the top /data/data###/... names and paths consistent seems to work well.

If we were to implement this type of following, I think it'd want to have it only folllow explicitly named symlinks, instead of all symlinks. Or maybe just symlinks at the top N levels of the tree. But you'd need to be careful to not get into loops by accident. Following symlinks is a good way to double count stuff if you're now careful.

John

l8gravely avatar Feb 26 '20 19:02 l8gravely

Can it not just replicate the du -L functionality? I also point to files on NFS and frequently measure the size of the files being pointed to, so after installing duc I immediately looked in the docs for link-following options, but the only one is for hard links and symlinks are not even mentioned.

I don't know enough to follow the nuances of this thread, sorry, but no discussion or theorizing is needed if it just mirrored the syntax and behavior of du right?

d-tork avatar Aug 10 '21 14:08 d-tork

"d-tork" == d-tork @.***> writes:

d-tork> Can it not just replicate the du -L functionality? I also d-tork> point to files on NFS and frequently measure the size of the d-tork> files being pointed to, so after installing duc I immediately d-tork> looked in the docs for link-following options, but the only d-tork> one is for hard links and symlinks are not even mentioned.

We skip counting symlinks, because we A) can't control where they point and B) they don't actually take up appreciable space in the filesystem.

d-tork> I don't know enough to follow the nuances of this thread, d-tork> sorry, but no discussion or theorizing is needed if it just d-tork> mirrored the syntax and behavior of du right?

The problem with symlinks is that we can get into a loop in the filesystem following them, so we don't. It's just much simpler.

John

l8gravely avatar Aug 10 '21 15:08 l8gravely