Add support for non UTF-8 json input
Is your feature request related to a problem? Please describe.
sonic-rs would fail if the input bytes contain non UTF-8 characters, even for pub fn from_slice<'a, T>(json: &'a [u8]) function. However, there exists cases bytes containning non UTF-8 json need serialize/deserialize support, typically encoding GBK/GB18030 in China.
Describe the solution you'd like
- add support for non UTF-8 encoded json bytes in
from_slicefunction - or drop
from_slicefunction
Describe alternatives you've considered
N/A.
Additional context N/A.
Hello, according to the json rfc, unicode encoding is enforced.
Furthermore, does other json library such as serde_json, simd_json support non utf-8 input?
@PureWhiteWu sorry for late reply.
serde_json can deserialize non UTF-8 bytes. simd_json not tested.
Aware your design principle to adhere to JSON std. However, UTF-8 is not the only encoding impl of unicode. Say, if UTF-16 support is on your roadmap, maybe other non unicode encoding support could be simply achieved with little effort I guess.
Moreover, JSON std suggests support non UTF-8 encoding as an impl extension.
Last words: GBK/GB18030 encoding is much like UTF-8 keeping compatible with ASCII making it easy to support.
Thanks
@PureWhiteWu sorry for late reply.
serde_jsoncan deserialize non UTF-8 bytes.simd_jsonnot tested.Aware your design principle to adhere to JSON std. However, UTF-8 is not the only encoding impl of unicode. Say, if UTF-16 support is on your roadmap, maybe other non unicode encoding support could be simply achieved with little effort I guess.
Moreover, JSON std suggests support non UTF-8 encoding as an impl extension.
Last words: GBK/GB18030 encoding is much like UTF-8 keeping compatible with ASCII making it easy to support.
Thanks
Thanks, could you give a test case with code? I know serde_json will only not fail when parsing invalid UTF-8 into bytes.
@liuq19 See this repository for your convenience.
Thanks, we will investigate it