Certain Chinese characters are encoded with \U... prefix
Hi,
I was switching from Python-based YAML implementation to libyaml (for faster _load and _dump) and it seems that there are some Chinese characters that are not correctly (or maybe it is a feature?) emitted/dumped. This happens regardless of how input string is served (single quoted, double quoted, |- delimited). I enclosed the terminal output below.
I am aware that these are characters above 0xFFFF. (And that they are more character components than characters, just in case if someone wanted to point out that these are not in current/wide use.)
libyaml version: acd6f6f014c25e46363e718381e0b35205df2d83 (HEAD of master as of 2021.07.01)
𠂉 is changed to \U00020089
𠂤 -> \U000200A4
Also when 𠂤 or 𠂉 is found in the input the whole string is put into (double) quotes.
$ ./run-emitter -u /tmp/in.yaml
[1] Parsing, emitting, and parsing again '/tmp/in.yaml': PASSED (length: 255)
Hanzi: |-
(卌) (𠂉) (夕㐄) (舞) [𠂤阜] (灬) (卌) (𠂉) (無) (夕㐄)
Inline: "(灬) (卌) (𠂉) (無) [𠂤阜]"
OneQuote: '(灬) (卌) (𠂉) (無) [𠂤阜]'
WontQuote: |-
(卌) (夕㐄) (舞)
#### (length: 216)
OUTPUT:
Hanzi: "(卌) (\U00020089) (夕㐄) (舞) [\U000200A4阜] (灬) (卌) (\U00020089) (無) (夕㐄)"
Inline: "(灬) (卌) (\U00020089) (無) [\U000200A4阜]"
OneQuote: "(灬) (卌) (\U00020089) (無) [\U000200A4阜]"
WontQuote: |-
(卌) (夕㐄) (舞)
#### (length: 255)
Hi, would you please provide in.yaml so we can reproduce it?