Tinycbor mynewt upstream
Upstreaming changes made by mynewt and carrying over fixes
- Bring tinycbor up to date with mynewt tinycbor
- Changing implementation of cbor_encoder_get_extra_bytes_needed() and cbor_encoder_get_buffer_size() as part of cbor_buf_writer APIs
- Move bytes_needed field from CborEncoder to the buf writer, this is needed by the tests mainly.
- Fix cbor_buf_cmp to do memcmp and return complemented result
- iterate_string_chunks(): fixing NULL compare at the end of string and moving it out of the iterate_string_chunks(). This is to avoid buffer specific parser calls in the function
- cbor_value_get_next_byte() is removed in mynewt version of tinycbor, so, we track offsets of the buffer which can be used for comparison in the parser tests instead of calculating the offset
- Making the decoder and parser APIs backwards compatible
- Adding encoder writer and parser reader as part of the encoder and parser structure. This is to make the encoder and parser use new function of encoder_writer and decoder_reader without breaking backwards compatibility.
- Making the old API use flat buffers by default
- Adding APIs for initializing encoder and parser with custom writer and reader
- cpp test now uses tinycbor lib
- Make the default writer and reader conditional based on NO_DFLT_READER/WRITER define. This is because we want a default reader/writer to avoid API changes.
- Use it->offset instead of it->ptr to track buffer offsets
- Update resolve_indicator() static api parameters to use cbor value and access offsets instead of taking pointers as input parameters
- In validate_container() do a byte by byte comparison instead of memcmp since we no longer have access to the buffer directly Also, use offsets instead of pointers to validate sorted maps
- Added a new define for conditionally compiling in float support (NO_FLOAT_SUPPORT). This is because we want the float support to be compiled in by default.
- Use static_assert macro instead of Static_assert. Changed to avoid build failures.
- Add api to get string chunk, this is a callback which can be used by buffer implementations to grab a string that is divided in chunks which spans across multiple chained buffers
@thiagomacieira thanks for the early feedback, is 1 PR with multiple, atomic, self-contained commits OK or do you explicitly want multiple PRs?
I would prefer multiple PRs if you can. If you need, you can have more than one commit in each PR too.
Thank you for the quick review @thiagomacieira , I have rebased and resolved all conflicts. I would like to understand more about how you would like to split this PR into multiple PRs. This PR basically adds support for chained buffers with keeping backwards compatibility intact.
I have a suggestion, please let me know if this is what you would like :
- Make a PR for encoder and parser changes combined
- Make a separate PR for changes to tools
- Make a separate PR for changes to tests
If you would like I can even separate out parser and encoder changes.
Thank you so much for going through this.
I'd like to see changes logically grouped. For example, the first commit in the series says "tinycbor bug fixes". That does not belong with the rest. Please submit bug fixes in separate PRs, one per bugfix.
Another one is "Fix Windows build". Well, the Windows build isn't broken right now, so you must have broken it in your series. Remove the breakage instead of adding a later patch to fix. Similarly for "Remove mynewt specific pkg.yml" -- there's no such file right now, so if there's a file to be removed, it's because an earlier commit added it. Remove the addition.
Ah, I see what you mean now. I have created a consolidated PR for now. I will remove any bug fixes from this and put them in separate PRs. Thank you.
Please see https://github.com/thiagomacieira/tinycbor/tree/dev for an initial API. Pay close attention to 86053d7c273da0f50f1db3b916a905f24f5eff89
@thiagomacieira Are you suggesting a change to the API that the PR adds. I have rebased and resolved the conflicts that were introduced by the most recent merges.
@vrahane thanks for your attempt at rebasing, but you mustn't have done it right. Please look at your commit's change to cbor.h: it's removing a lot of code, the new API I added in 0.5. And your commit still does too much in a single go. I'd need you to split them into separate, logical chunks.
As for the API for reading and writing outside of a linear buffer, I had a need for it so I developed my own version (without looking at yours). I am asking you to review it and provide feedback. I have yet to add docs about it, but the API is unit-tested.
Thank you for the reply. The code from cbor.h has been moved to src/cbor_buf_writer.c. The constants & enums have been moved to cbor_defs.h. Code has not been removed. There are no API changes/removals in the PR. I can try to break it more into logical chunks.
Your version looks good to me but I think some functionality is missing for it to work with chained buffers. Would you like to get on a call and discuss the comparison between your changes and the changes in the PR ? That might be the best way to get an understanding what each of us want.
You need to justify those changes: why do we need new headers?
Let's schedule a call next week (can you send me an email reminding me?). I'd like to understand why it may not work with chained buffers. I've got a PoC implementation that should work with any kind of buffer.
There were compilation errors with cbor.h being included in src/cbor_buf_writer.c and src/cborparser.c. Hence, the new headers which separate out cbor constants and enums from inline apis. I do not have the exact compilation error log.
I have removed the extraneous header. It was needed earlier, it is not needed anymore. I have also removed any unrelated changes from this PR.
I will send you an email shortly scheduling a call. One case where the API you are suggesting might not work is for a string that spans across multiple chained buffers. I can compare more between your suggested API and my PR and we can discuss more over the call.
What I've done for strings is basically punt the problem. Both the "dup" and "copy" string functions require linear buffers. If your buffer isn't linear, then you must use the "get_string_chunk" function, which does not try to access the string in the first place. It's up to the caller to read string from the buffers.
There was some confusion, because the hash you had provided was from intel/tinycbor/dev and the URL pointed to your dev branch of your fork of tinycbor. Hence, the confusion.
I see what you mean. Giving it some more thought, I can try to change Mynewt's chained buffer (mbuf) API for tinycbor and test out your implementation. Would you like me to do that ?
Yes, that would be helpful. Right now, I have a strawman only and more testing is required.
(Regarding the new API in https://github.com/thiagomacieira/tinycbor/tree/dev)
Just a question - how would you expect get_string_chunk() to be used with chained buffers?
static CborError get_string_chunk(CborValue *it, const void **bufferptr, size_t *len)
Maybe my understanding of this function's semantics is incorrect. Here is my understanding: a successful call causes *bufferptr to point to a buffer containing a string chunk of size *len. What if we have a 5-byte string chunk ("aaaaa") spread across two buffers, as follows:
+-------------+ +-------+
| 65 61 61 61 | --> | 61 61 |
+-------------+ +-------+
In other words, the first three characters are in the first buffer, and the remaining two are in the second buffer.
Did you have any thoughts about how get_string_chunk() should handle a case like this?
EDIT: Also, cbor_value_advance() seems to only work with fixed length buffers (it calls _cbor_value_copy_string()). Would you agree?
That's a very good question and that's where I'd like input. It is possible with the API as I wrote it, but it's ugly.
Please note that this is more complex than what you asked. There's reading from a chained buffer and there's also reading into a chained buffer.
There are two possible implementations with the API. The internal get_string_chunk function calls the transfer_string callback with the pointer that the user passed, the number of bytes to skip and the number of bytes in the string chunk. So the callback must first skip offset bytes, then arrange for len bytes of the string chunk to be made available. The choice is here: it can store a pointer (or something else) or something else in the user's buffer then advance the buffer by len bytes, or it can do nothing.
The "do nothing" case is possible because if you're iterating chunks, then TinyCBOR will return after that, allowing the input buffer to be positioned exactly at the beginning of the string. The caller can then deal with the string directly, in zero-copy fashion. It just needs to be sure to have advanced the buffer by len before calling _cbor_value_get_string_chunk again. This is what I've done in my implementation in Qt (see https://gitlab.com/thiagomacieira/qtbase/blob/524372bc4691be05eacb845999b839a687181565/src/corelib/serialization/qcborstream.cpp#L498 and https://gitlab.com/thiagomacieira/qtbase/blob/524372bc4691be05eacb845999b839a687181565/src/corelib/serialization/qcborstream.cpp#L531).
The consequences for the API:
-
_cbor_value_get_string_chunk: works. Whether the value stored inbufferptris a pointer or something different depends on what thetransfer_stringcallback does. -
_cbor_value_copy_string: works only if buffer parameter is NULL and only if it's not the "do nothing" solution. -
_cbor_value_dup_string: does not work (but it callsmalloc(), so you didn't want it anyway)
Let'st take your 5-character example:
+-------------+ +-------+
| 65 61 61 61 | --> | 61 61 |
+-------------+ +-------+
Before get_string_chunk, the buffer state (whatever it is) points to the 0x65 byte. After the call, the two options are:
- the input buffer points to the first 0x61 byte,
lenwas set to 5. The caller must process 5 bytes, then advance the input buffer past the last 0x61. - the input buffer points to after the last 0x61,
lenwas set to 5 andbufferwas set to something non-zero that allows the caller to find the first 0x61 (for example, an offset from the beginning of the buffer).
Thanks for the explanation, Thiago. It makes sense.
When you say the transfer_string callback can "do nothing", it should still add the offset to the reader's internal pointer (or offset count), right? That seems to break cbor_value_calculate_string_length(), as it causes the parser to advance past the string descriptor. Or maybe I have misunderstood something...
No, you're correct. If it does not adjust the internal pointer, any TinyCBOR function that iterates over the string will fail.
My Qt wrapper didn't call any such function, so there was no failure. That's how it got away with "doing nothing". Or so I thought: the validate() function does need that and must be failing, it just happen not to be tested with the condition. Thanks for pointing out.
I've also modified the writer function for the encoder to take an extra parameter, which will be different when appending the user's string. This should allow for encoding strings from chained buffers too.
OK, thanks. I think there might be an issue if the transfer_string changes the opaque reader object. I am not especially familiar with tinycbor, so forgive me if I am wrong about a few things. As I understand it, the cbor_value_get_[...] functions parse a value without advancing past it. This is done by making a temporary copy of the CborValue iterator, and reading from it rather than from the original. However, the original CborValue object and the copy both point to the same external reader, so if transfer_string modifies the reader object, then operations performed on the temporary CborValue object make permanent changes to the external reader.
This may just be a more general case of the cbor_value_calculate_string_length() issue discussed earlier.
I don't know if you had a chance to look at this PR. The approach it uses is to add a get_string_chunk function pointer to the reader interface (the equivalent of struct CborParserOperations in your updated API). The nice thing about having this callback is that it allows the existing string functions to work with chained buffers. If a single string chunk is split across multiple buffers, iterate_string_chunks() calls get_string_chunk() for each contiguous segment of the string.
I think there is some extraneous stuff in the Mynewt API (e.g., the get8, get16, etc. callbacks), so I'm not suggesting it is ideal. I do like the get_string_chunk callback, however.
You're almost right. Your description is strictly speaking incorrect, but the effects are the same. The cbor_value_get_xxxx_string_chunk functions call _cbor_value_get_string_chunk, which copies the input iterator to the output parameter that will be the next item and then operates exclusively in that. The incoming parameter for "current" stays unmodified, in case you wanted to read more than once. But if the transfer_string callback modifies the reader, you cannot do that.
If you don't need to re-read, then it's not a problem to modify the reader.
I'll take a look at your implementation. Like I said, this is the ugliest part of the API I've designed and I needed feedback.
I have rebased and resolved conflicts, I did the following:
In my earlier version of the PR I had an added and container_size(used only if it was a container) fields in the encoder. With the most recent changes in the library, specifically commit cc2bfbb20954f0be9237624b53feca0e27e88f72 which removes added and adds remaining which combines functionality of container_size(which my PR was adding) and added (which was part of the library before I opened the PR). Hope this gets us one step closer to merging the PR.
@thiagomacieira Can you please restart the travis-ci build. It seems it has got stuck in some intermittent state.
Ok, I like your changes, but not enough to accept them. There are minor things in the change (like structures not following my coding style), but the most important thing is that it's a large growth in size.
I need some time to study the possibilities and how they affect the code it generates. Moreover, I need to merge with my own changes I've made for dev, which already group the reading and writing. I will incorporate some of your ideas, but not all. Given my current time commitments, this should take two or three months to complete.