Fix `dir::get_size`'s handling of symlinks + directories
This makes dir::get_size give a correct calculation of the size of a path of directory, that now matches that returned by eg: du --bytes -s <path>.
Before:
- Symlinks were followed, causing double-counting of symlinked files (see #59).
- Whilst the metadata size of subdirectory entries was counted correctly, the metadata size of the parent directory (the initial
pathpassed intodir::get_size) was not counted. See: https://github.com/webdesus/fs_extra/issues/25#issuecomment-1172428052
Now:
- Symlinks are no longer followed (fixing the regression from #34), so that only the size of the symlink itself is included (as expected), and not the size of the symlink's target.
- If the initial
pathpassed intodir::get_sizeis a directory, the size of that directory entry itself is now correctly counted (like it was already being counted for subdirectories).
In addition, the size calculations are performed using DirEntry::metadata, rather than unnecessarily converting to a path and then passing that path to fs::{metadata,symlink_metadata}.
Fixes #25. Fixes #59.
Hi @edmorley! Thank you for your incredible work!!! I check and fix fall tests this week and will accept this PR😊
@webdesus Thank you! Did some tests fail locally for you? The test I added passed locally for me (macOS).
@edmorley yes locally, not yours test, others:
it_details_item_dir
it_details_item_dir_short
it_get_file_size
it_ls
I looked at the reason. I made this decision on purpose not to count the size of the folders. My point is:
- We copy only content. Folders are created again.
- Different file systems have different sizes of folders. And if we copy data to another file system size can be different
- ext4 folders can be large(3 MB and more), but after copying in the same file system (ext4), the folder again will be 4kb.
- Many file management programs have the same logic.
I suggest the following edit:
let mut size_in_bytes = 0; // FIRST CHANGE.
if path_metadata.is_dir() {
for entry in read_dir(&path)? {
let entry = entry?;
let entry_metadata = entry.metadata()?;
if entry_metadata.is_dir() {
size_in_bytes += get_size(entry.path())?;
} else {
size_in_bytes += entry_metadata.len();
}
}
}else{
size_in_bytes = path_metadata.len(); // SECOND CHANGE
}
Сonsider only files.
What do you think about it?
@webdesus Hi! Thank you for the reply.
So for me dir::get_size is about the size of a directory right now, not about the size of copying. The PR at the moment matches the result returned by du, which is why I feel it makes the most sense.
However if you prefer for dir::get_size not to count directory sizes, then I'm not going to strongly object (since the most important fix here is the symlink fix, and as long as that is merged, the rest is less important).
However, if dir::get_size doesn't count directory sizes, I feel this would be surprising to a lot of people, so should be mentioned in the rustdocs at the very least.
@webdesus Hope you are well :-) Any thoughts on the above?
Hi, @edmorley! Very big sorry for the delayed response. I didn't have time for open source all this time. First of all, I thought about how we could do a different method that would include folder size, but then I decided to use the old way of calculating size without folder size. I pushed changes, and if you don't mind, I can merge them.
@webdesus No problem about the delay, I know the feeling! :-)
I'm fine with you merging that change, thank you. (I still think the implementation should count directory sizes too, but will just be happy to get the other fixes in, since they are a bigger problem.)
I still think the implementation should count directory sizes too
I think it is sometimes reasonable too. But this API work a long time, and I think we can do it in another feature.