feature: support transcode to/from JSON, and to/from CBOR using different options
Is your feature request related to a problem? Please describe.
Given CBOR's use of the JSON data model as a starting point, it is often desirable to receive data in JSON and output semantically identical data to CBOR, or vice versa.
Because this package has selective method-signature compatibility with encoding/json, transcoding support would be a critical piece to CBOR adoption in some applications.
Notably, decoding JSON into any, and then re-encoding that any to CBOR is undesirable because:
- It's typically rather expensive. Consider a message publisher receiving JSON and broadcasting equivalent CBOR to save subscribers decode cost: the added cost of
any-based transcoding in the publisher may outweigh the sum of savings awarded to the subscribers, thus yielding a net loss in total efficiency. - Compressed CBOR isn't necessarily small enough compared to compressed JSON to yield wins when compression is reasonable, and compressed JSON is typically smaller than uncompressed equivalent CBOR, when the data is of non-trivial size. The cost of compressing JSON may well be cheaper than the cost of
any-transcoding to CBOR without compression. - The conversion through
anymay be lossy (decode JSON number to float64), or non-lossy-but-incompatible without extra work (decode to json.Number, which gets CBOR encoded as a string).
Describe the solution you'd like
Either within this package, or as a subpackage, functionality which:
- Iteratively tokenizes JSON and produces a
[]byte(or writes to anio.Writer) with equivalent CBOR data. The JSON decode would probably internally use(*json.Decoder).Token. - Equivalent CBOR to JSON transcode functionality.
- CBOR-to-CBOR re-encode, i.e. switch from indefinite length to definite length, different timestamp formats, modes, etc.
- Ideally, an option to control JSON timestamp detection (i.e. if a string decodes as an RFC-3339 timestamp, encode it as a CBOR timestamp).
- Ideally, an option to control Indefinite vs Definite length value encoding. Indefinite length would certainly use less memory and encode faster due to not needing lookahead/buffering. There are tricks that could be used with two-pass decoding or maintaining look-back indices (i.e. allocate all arrays/maps/(byte)strings as if they need heuristic n-byte lengths, then re-encode the initial byte(s) alongside a memcpy to shift data bytes left or right once actual size is known).
Today I figured out how to add a tag so that json.Number values round-trip correctly. Is this the recommended approach?
tags := cbor.NewTagSet()
tags.Add(
cbor.TagOptions{EncTag: cbor.EncTagRequired, DecTag: cbor.DecTagRequired},
reflect.TypeOf(json.Number("")),
284,
)
em, _ := cbor.EncOptions{}.EncModeWithTags(tags)
dm, _ := cbor.DecOptions{DefaultMapType: reflect.TypeOf(map[string]any(nil))}.DecModeWithTags(tags)
This configuration increases compatibility with values decoded by encoding/json with UseNumber.
For the tag number I used 284, which the registry reserves for JSON numbers as strings.
I updated the above example, as last week the CBOR project and IANA accepted JSON Number String Tag for CBOR and assigned it tag 284.