bacnet4j-wrapper icon indicating copy to clipboard operation
bacnet4j-wrapper copied to clipboard

Chinese garbled code

Open qaqRose opened this issue 1 year ago • 8 comments

I tried to get the description in deviceObject, but the output Chinese garbled code, how should I deal with its coding?

image

image

qaqRose avatar Apr 16 '24 02:04 qaqRose

This does not seem to be a programming problem, which is repeated on another tool, yabe image

qaqRose avatar Apr 16 '24 02:04 qaqRose

Hello @qaqRose, it doesn’t have to be like that. I believe that we can introduce/configure text encoding in library to avoid this issue.

splatch avatar Apr 16 '24 06:04 splatch

It boils down to this code I suppose: MangoAutomation/BACnet4J#55, note - it can be addressed above bacnet4j by post-re-processing CharacterString. Can you please provide me a byte representation of string you tested with and Chinese characters it should draw? Wireshark capture should contain string and an extra tag of text encoding.

splatch avatar Apr 16 '24 15:04 splatch

Hello @splatch , Thank you for your reply. I have provided some Chinese test characters and Wireshark message files, which mainly contain the connection of bacnet client, the acquisition of devices and the acquisition of BacNetObject information.

Tips: all the fields with garbled codes in Chinese are accountion.

example:

  analog-ouput,15   3号主机冷却阀控制模式
  analog-ouput,16   3号主机冷冻阀控制模式
  analog-ouput,17   3号主机控制模式

image

bacnet.zip

qaqRose avatar Apr 17 '24 06:04 qaqRose

Hello @splatch , It seems to be caused by not supporting MBCS. When I switched to UTF-8, I didn't have this problem.

image

qaqRose avatar Apr 17 '24 06:04 qaqRose

I've made basic test with byte representation of your text and it doesn't match. The device side encoding is probably wrong or not in line with UTF-8. At least for description of analog output 16 (packet 555 in your capture/screenshot).

Wireshark input: [75,16,0,33,ba,c5,d6,f7,bb,fa,c0,e4,b6,b3,b7,a7,bf,d8,d6,c6,c4,a3,ca,bd] Test code:

CharacterString str = new CharacterString(Encodings.ANSI_X3_4, "3号主机冷冻阀控制模式");
ByteQueue bq = new ByteQueue();
bq.push(new byte[] {0x75, 0x16}); // header
str.writeImpl(bq);
System.out.println(new CharacterString(bq)); // prints 3号主机冷冻阀�

The output is not 1:1 with your input, but what worries me, and leads to assumption that device side might be wrong, is byte representation of string, at least its beginning: [75,16,0,33,e5,8f,b7,e4,b8,bb,e6,9c,ba,e5,86,b7,e5,86,bb,e9,98,80,e6,8e,a7,e5,88,b6,e6,a8,a1,e5,bc,8f]. Header 0x75, 0x16 is same, then 0x00 which indicate encoding (utf8) and 0x33 which stands for 3 in UTF-8. What goes after that is of the track.

Other devices/objects in your dump look fine. It doesn't seem to be library issue, especially that text encoding declared by device is UTF-8.

splatch avatar Apr 17 '24 21:04 splatch

Yes, the coding on the device side is not UTF-8. I don't know why wireshare is displayed as UTF-8. This is the message that I set up the device side to use UTF-8.

image

       CharacterString str = new CharacterString(CharacterString.Encodings.ANSI_X3_4, "3号主机冷冻阀控制模式");
        ByteQueue bq = new ByteQueue();
        bq.push(new byte[] {0x75, 0x20}); // header
        str.writeImpl(bq);
        System.out.println(new CharacterString(bq)); // prints 3号主机冷冻阀控制模式

and when i change the 0x16 the length value type 0x20

It is correct to print the text

qaqRose avatar Apr 23 '24 01:04 qaqRose

utf-8

utf8_bacnet.zip

qaqRose avatar Apr 23 '24 01:04 qaqRose