Allow having broken symlinks in the packaged cargo archive
Problem
In https://github.com/Ekleog/kannader/tree/main/smtp-queue-fs, I have a res/ folder that intentionally contains a broken symlink. This folder is properly tracked by git, and used by tests to validate that smtp-queue-fs behaves correctly when faced with a queue that has a broken symlink
Proposed Solution
It would be nice if cargo were able to store the same types of files as git, so including broken symlinks. This would make it possible to upload this kind of crates with its tests on crates.io
Notes
I guess an alternative would be to not have the broken symlink be checked-in, and instead handle the test creation and validation differently. However, it makes it harder to just add tests, given that it means manual care needs to be given to each test that has a broken symlink.
I would understand if you think that'd be a non-feature and you didn't want to allow including broken symlinks in the crate on crates.io, but I thought it was worth opening this issue to gather feedback over whether it'd be of interest. Sorry if it has already been discussed elsewhere!
Thank you for the proposal!
I totally understand what you're trying to solve. Also have no idea why Cargo implemented like this, as the logic dates back to Cargo v0.0.1-pre😆. However, I do have concerns on this being implemented.
Cargo tries its best to make every .crate file always compile. Based on that fact, Cargo resolves all symlinks, and copies target contents over into tarball. A common use case is sharing root README and LICENSE file within a workspace, so that each member gets a real copy of that file and published to crates.io with a sense of integrity.
If Cargo now switches the behaviour to align with git, some projects will start failing as they cannot find the actual contents during compiling. If we keep the current behaviour and only treat broken symlinks as you proposed, the inconsistency sounds even more confusing, at least to me.
We may loose the restriction a bit, such like when Cargo.toml includes explicitly includes a broken symlink, just includes it and emits a warning. I am not entirely sure what it might cause if we go this route. People might find their files missing after publish and that is really unfortunate.
Hmmm… as you're saying, I guess silently changing the behavior or reusing include wouldn't make much sense.
Maybe introducing an as-symlink = ["file1", ..., "fileN"] option, that would copy the symlink verbatim without dereferencing, would make sense? And then it could error out if file* ever were to not be a symlink, or to be excluded.
Hmm… It could be a solution but not optimal in my opinion. There are some concerns around it.
- The manifest has an already-too-long list for each field. Personally, I will avoid adding more fields onto it if possible. Sorry for saying so but
as-symlinkseems not a field everybody needs. - The real file walk happens in
PathSource::list_files, which then determines to walk through either git index or heuristic filesystem walk. If a symlink is in git index it may be fine to include. However, the caller doesn't know what kind of walk they did, so we need to return something telling that, and doing so like leaking some implementation details. In addition, git-walk also follows symlinks, so it may end up duplicating filter algorithm in several places.
Though I don't have any better idea than what you propose so far. 😞
Hmm you're right, but I don't see any better option that'd avoid backwards-compatibility hazards either :/ guess I'll leave this open hoping for other people to come in with ideas!
Would you be open to a PR that mentions in the cargo package and cargo publish docs that symlinks are followed and symlinks are never(?) included in the tarball?
Would you be open to a PR that mentions in the cargo package and cargo publish docs that symlinks are followed
Yeah, I think that is appreciated! Perhaps adding one simple sentence to the second step of cargo package is good enough?
Just checked the implementation again. I believe File::open does follow symlinks, so tar sees real contents and copies over during archiving. So yeah symlinks are followed and files will be duplicated.
https://github.com/rust-lang/cargo/blob/82c489f1c612096c2dffc48a4091d21af4b8e046/src/cargo/ops/cargo_package.rs?plain=1#L930-L942
and symlinks are never(?) included in the tarball?
I would personally avoid this kind of statement, as I may miss some details or there are workarounds to bypass how Cargo follows symlinks.
Cargo tries its best to make every
.cratefile always compile. Based on that fact, Cargo resolves all symlinks, and copies target contents over into tarball. A common use case is sharing root README and LICENSE file within a workspace, so that each member gets a real copy of that file and published to crates.io with a sense of integrity.
Since what version does cargo copy the content of the symlinked files as it appears that sometimes the symlink destination path ends up in the tarball instead of the destinations content, see https://github.com/sstadick/cargo-bundle-licenses/issues/53
Edit: Ah, maybe these are instances of https://github.com/rust-lang/cargo/issues/5664