msgpack icon indicating copy to clipboard operation
msgpack copied to clipboard

Question/suggestion regarding the length of arrays and maps

Open rotorz opened this issue 10 years ago • 11 comments

The specification requires that the number of elements inside the array or map is declared ahead of time immediately after the format code.

Would it be useful to define a format code which marks the end of an array or map so that the number of elements doesn't need to be defined? This would make it easier to send arrays and maps of unknown lengths across a stream.

+--------+~~~~~~~~~~~~~~~~~+--------+
|???????0|    N objects    |???????1|
+--------+~~~~~~~~~~~~~~~~~+--------+

where
* ???????0 is the format code for the start of an array
* ???????1 is the format code for the end of an array
* N is some unspecified number of objects

rotorz avatar May 13 '15 10:05 rotorz

let's call the format like "?0 data ?1" as tag-pattern. and maybe there has need for null-terminated pattern. From MsgPack point, a better solution: send data by fixed-length blocks.

For unknown length data, sending without length is easy, but the receiver is a pain result.

lyrachord avatar Mar 03 '16 02:03 lyrachord

I think supporting every usecase makes msgpack complicated. I don't want to use complicated format.

methane avatar Mar 03 '16 02:03 methane

the current design require whole msgpack encoded data to be hold in memory at same time. sending/receiving without length would allow msgpack to be used as a streaming encoding format.

rinick avatar Jun 27 '17 19:06 rinick

one way to achieve this is to utilize the unused tag 11000001 as the termination of the current Map/Array

for example, if I encode a array with size 0xffffffff, and after encoding 10 objects, I reach the end of the my array, so I put a 11000001, and the parser knows this is the last one, and next object won't be part of the array

rinick avatar Jun 27 '17 23:06 rinick

+1 to @lyrachord, chunking your object stream to fixed length arrays seems fair enough, plus, maybe finishing the stream with null or stuff like that. At least it's not the job of serialization layer. But if it's really important it could be fair adding original extension types and wait for implementation.

I wonder what kind of problem adding a new format code solves. Does it pay for the performance penalty of iterating over a stream of unknown length? Is it worth adding code to or changing design of all implementations? The number of unknown length must be as few as possible, ideally only length of the byte stream.

kuenishi avatar Jun 28 '17 07:06 kuenishi

Would love this!

I was looking to replace JSON in our stack with MessagePack but can't because we stream JSON out to clients.

Because the element count is unknown (1 to thousands in my case) all elements would need to be buffered in memory, counted, and then serialized. This also buffers the whole response and keeps the client waiting when they could be receiving and parsing the response.

For now I will stick to JSON 😞

malomalo avatar Oct 12 '17 00:10 malomalo

+1 on this, same situation as @malomalo.

uwburn avatar Jan 26 '18 14:01 uwburn

@malomalo Actually, the receiver could process the current format in a streaming way, just counting the objects along and comparing it to the number from the header.

But on the sender side, I see that this can prevent some use cases.

markusschaber avatar Feb 07 '18 16:02 markusschaber

@markusschaber Correct. I'm using JSON and msgpack to stream big DB resultset (streamed as well), with JSON, at the beginning of the streaming i don't need to know how many elements i will serialize and stream, with messagepack i needed to do another query retrieving the count before starting the streaming. In my case it was rather easy to work around it and without a big performance penality associated to the addition query, but there are situation where it can be annoying or not possible to do without buffering all the array before starting the serialization.

uwburn avatar Feb 07 '18 16:02 uwburn

@markusschaber Correct the format doesn't block the the receiver; I don't think you need to compare the count with the number from the header, if it's not equal MessagePack should throw an error when it tries to process the remaining data or it prematurely ends.

The issues is only on the sender size and is unavoidable.

I'm not sure why it's done this way, I think it uses more space then just having a start byte and an end byte. Would still love to use MessagePack though but still on the JSON boat because of the ability to stream out to the client.

malomalo avatar Feb 08 '18 04:02 malomalo

Just as a side node: It could be worse. :-) In one of our use cases, it helps a lot that we just need to know the count of elements, but not their size (in bytes) in advance. We have all the meta data (imagine: list of files) beforehand, but we pick them one by one from storage and format them using MessagePack. So we actually can use it in a streaming way (as we know the count, but not the formatted size of the elements).

markusschaber avatar Feb 08 '18 07:02 markusschaber