mtree: add xattr support
This adds support for optionally writing xattrs to mtree output, and limited ability to read them from mtree input. The format of the keyword is the following, one for each xattr present for the entry:
xattr.<name>=<b64 encoded value>
This format is inspired by go-mtree by Vincent Batts, though
there is a possible slight incompatibility: libarchive's existing base64
encoder does not perform padding because it was written to encode xattrs
into PAX attributes which already have a length component. The decoder
should accept padding ok, but the mtree parser may not like the
unescaped '=' characters.
The name is escaped according to existing mtree escaping practices which
is octal encoding of [^!-}] except [ #=\] is also encoded. go-mtree
performs its own implementation of vis(3) on it with the flags
VIS_WHITE, VIS_OCTAL, and VIS_GLOB. There is some disjointedness here
but it should be compatible in the common case of no escaping of course.
Note that for output this is gated on --options=mtree:xattrs, which I've
left out of the default mtree options.
So far I've tested converting a PAX file directly to mtree with bsdtar and it seems to work well.
todo:
- [x] reading mtree currently doesn't decode the xattr name
- [ ] add to bidding?
- [x] test bsdtar from filesystem, including
--[no-]xattrs - [x] test directory xattr support
- [ ] add tests?
- [ ] when reading an mtree, also fill in xattrs from disk if available - this is tricky...
- [ ] PAX tar files made by libarchive default to putting both the LIBARCHIVE and SCHILY PAX attributes in order to encode xattrs, but this causes libarchive to store the same xattr twice internally. Therefore if converting directly from PAX to mtree you get duplicated xattrs in the output. I don't think there's any easy way to fix this other than changing how libarchive keeps track of xattrs internally.
- [x] add information to
mtree(5) - [ ] test behavior when duplicate entries with xattrs are encountered
Comments and suggestions welcome! I'm opening this now to see if this is something upstream is interested in before I go further.
While testing this I've discovered what might be considered a bug: when writing SCHILY.xattr PAX attributes, libarchive urlencodes the xattr name, but when reading back SCHILY.xattr it doesn't decode the name. LIBARCHIVE.xattr urlencodes and decodes the xattr name.
As far as I can tell schily-tools tar does not perform any sort of encoding on the name (similar to the value) but considers it a string (so it must not contain a NUL). ~~I think libarchive should just read/write the SCHILY name without any coding then.~~ GNU tar urlencodes '=' and '%' so that behavior is probably more important.
Please don't put functions into header files. We are probably looking for something like archive_base64.h + archive_base64.c
@mmatuska I've moved the functions to archive_base64.c, put the prototypes in archive_base64_private.h, and rebased onto master. Any thoughts on the other items?
I think I could add grabbing xattrs from disk (when checkfs is enabled) in parse_file() if I were to rewrite it using the archive_read_disk family of functions; otherwise I would need to copy the ifdef mess from there.
For PAX->mtree etc. duplicate xattrs issue it is a minor problem but it could potentially be fixed by storing xattrs on archive entries as a hash table with last one wins semantics instead of a linked list, but I'd probably do that in a separate PR.