node-bencode decode without using the entire input

Not sure if this is a bug or not but I'd like to discuss how decode should behave when we feed multiple parts.

Example:

bencode.decode('i123ei123e', 'utf8'); // 123

The input has more data than that. When we use the decode function, it silently returns just the first decoded part. Is this intended?

I saw some cases where a single buffer they wanted to decode had multiple bencoded parts (it was some data received from the network they had no control over). And when they used this lib, it was silently missing parts. So they had to implement some logic to check if the decoded result was everything the input had or not. They managed to get everything working, but I'm wondering if the lib should give a heads up (exception or whatever) when this happens.

Dec 29 '16 13:12 fasfsfgs

At first: sorry for the delay. The holidays...

I see your point. Albeit that "i123ei123e" (AFAIK) is not valid bencode (since its actually two values), there are (at least) two ways we can handle that (except the current, 'ignore it' way that ist):

a) throw an Error when there are unconsumed bytes left in the buffer after "finishing" the decode. b) provide a decode.resume() method that takes off where decode() stopped and just starts a new decode task starting at lastPosition + 1.

Are there any examples on how other parsers (JSON, YML, XML, other bencoders) handle this? As far as I am aware most of them throw an error like this:

Jan 02 '17 07:01 themasch

Sorry I don't have much experience with parsers and how they usually deal with this. I ended up using another lib that behaves pretty much like json parser - option a).

Since I'm dealing with chunks of data coming from a network connection, I had to expect chunks with incomplete bencode parts as well as chunks with multiple bencode parts and I had to try to decode them as I received them because the message saying that I could stop listening to the connection is actually in those bencoded parts.

So I did a method that received a buffer and returned two objects:

an array of successfully decoded parts
a buffer with the rest of the buffer received that didn't get decoded

Could you elaborate on how your potential resume method would be used?

Anyway, if it's not a common case, I think it's safe to just go with option a). Only if you saw that a lot of your clients had a case like mine that I'd put more logic into the lib. That's my 2c. Let me know if I can help you out with anything.

Jan 02 '17 13:01 fasfsfgs

in case we do (b), the parser would work as it currently does. There would be a bencode.decode.resume() function that checks if the internal cursor is at the end of the internal buffer (decode uses global state all over the place anyway) and if there are bytes left to be read it would just start another decode() on the remaining buffer. repeat until everything is read.

const input = buffer_from_source()

let results = [ bencode.decode(input) ]
while(let more = bencode.decode.resume()) {
  results.push(more)
}

Maybe a "canResume" method could be of good use so you can actually check if theres more data before calling resume.

Jan 13 '17 08:01 themasch

Oh I understand better your idea now. Well. If there is any way of getting at least other contributor's opinion, that would be great. I'm really not sure what's the best decision here.

ignore this issue
stateful resume
exception
other

Jan 13 '17 14:01 fasfsfgs

while strict parser should choose 3., you should currently be able to call decode.next() and get the next result. Thats pretty close to what a resume() would be.

Throwing an exception would also require a new major release since it would change the current behavior in these cases.

@jhermsmeier what do you think about a 2.0 branch that throws an exception and offers a decodeGreedy (name to be fixed) method that returns something clever that contains (or returns) all results. Like an iterator?

Jan 13 '17 15:01 themasch

Like iterators!

Oct 10 '18 09:10 jimmywarting