Utf-8 encoding not supported
Hi!
Decoder does not support all unicode chars in strings even though torrent spec says strings are utf-8 encoded. I suspect originally strings were only ascii encoded. I also see some torrent files have a specified encoding key. But I think we should assume utf-8 otherwise.
The problem is of course the pieces part of bencoded data that needs to be read as raw bytes and not as character stream, but the rest of the bencoded data could be read as character stream. Your library is reading binary stream, using InputStream (BufferedInputStream), which is fine for ASCII chars, since 1 char = 1 byte, but not for all the unicode chars (only first 256), because you can not know how many bytes you need to read to get the whole char. Try decoding this string "3:Žan", you will get burned ;)
The way Java handles character vs byte streams is a bit awkward, so I don't think it is possible to use BufferedReader and DataInputStream in a single pass, because readers consume the whole InputStream.
Looks like you will need to read a file two times, first time you read the character sequence and while doing that also remember the position and the length of pieces section, so on next pass you just read bytes from the offset into a byte array. This solution is not very efficient, but I don't have any other idea right now. There is also possible problem of reading pieces as character streams, because they are not chars, so a reader could read to many bytes. Im gonna look for other java bencoding libs, to see how they solved this problem.