libarchive icon indicating copy to clipboard operation
libarchive copied to clipboard

mtree: add xattr support

Open maxcrees opened this issue 5 years ago • 5 comments

This adds support for optionally writing xattrs to mtree output, and limited ability to read them from mtree input. The format of the keyword is the following, one for each xattr present for the entry:

xattr.<name>=<b64 encoded value>

This format is inspired by go-mtree by Vincent Batts, though there is a possible slight incompatibility: libarchive's existing base64 encoder does not perform padding because it was written to encode xattrs into PAX attributes which already have a length component. The decoder should accept padding ok, but the mtree parser may not like the unescaped '=' characters.

The name is escaped according to existing mtree escaping practices which is octal encoding of [^!-}] except [ #=\] is also encoded. go-mtree performs its own implementation of vis(3) on it with the flags VIS_WHITE, VIS_OCTAL, and VIS_GLOB. There is some disjointedness here but it should be compatible in the common case of no escaping of course.

Note that for output this is gated on --options=mtree:xattrs, which I've left out of the default mtree options.

So far I've tested converting a PAX file directly to mtree with bsdtar and it seems to work well.

todo:

  • [x] reading mtree currently doesn't decode the xattr name
  • [ ] add to bidding?
  • [x] test bsdtar from filesystem, including --[no-]xattrs
  • [x] test directory xattr support
  • [ ] add tests?
  • [ ] when reading an mtree, also fill in xattrs from disk if available - this is tricky...
  • [ ] PAX tar files made by libarchive default to putting both the LIBARCHIVE and SCHILY PAX attributes in order to encode xattrs, but this causes libarchive to store the same xattr twice internally. Therefore if converting directly from PAX to mtree you get duplicated xattrs in the output. I don't think there's any easy way to fix this other than changing how libarchive keeps track of xattrs internally.
  • [x] add information to mtree(5)
  • [ ] test behavior when duplicate entries with xattrs are encountered

Comments and suggestions welcome! I'm opening this now to see if this is something upstream is interested in before I go further.

maxcrees avatar Jun 18 '20 10:06 maxcrees

While testing this I've discovered what might be considered a bug: when writing SCHILY.xattr PAX attributes, libarchive urlencodes the xattr name, but when reading back SCHILY.xattr it doesn't decode the name. LIBARCHIVE.xattr urlencodes and decodes the xattr name.

As far as I can tell schily-tools tar does not perform any sort of encoding on the name (similar to the value) but considers it a string (so it must not contain a NUL). ~~I think libarchive should just read/write the SCHILY name without any coding then.~~ GNU tar urlencodes '=' and '%' so that behavior is probably more important.

maxcrees avatar Jun 18 '20 12:06 maxcrees

Please don't put functions into header files. We are probably looking for something like archive_base64.h + archive_base64.c

mmatuska avatar Jul 17 '20 01:07 mmatuska

@mmatuska I've moved the functions to archive_base64.c, put the prototypes in archive_base64_private.h, and rebased onto master. Any thoughts on the other items?

maxcrees avatar Jul 18 '20 07:07 maxcrees

I think I could add grabbing xattrs from disk (when checkfs is enabled) in parse_file() if I were to rewrite it using the archive_read_disk family of functions; otherwise I would need to copy the ifdef mess from there.

maxcrees avatar Jul 18 '20 22:07 maxcrees

For PAX->mtree etc. duplicate xattrs issue it is a minor problem but it could potentially be fixed by storing xattrs on archive entries as a hash table with last one wins semantics instead of a linked list, but I'd probably do that in a separate PR.

maxcrees avatar Jul 18 '20 22:07 maxcrees