openjdk icon indicating copy to clipboard operation
openjdk copied to clipboard

Help text has wrong encoding when "Beta: Use Unicode UTF-8 for worldwide language support" is enabled

Open everything411 opened this issue 4 years ago • 14 comments

Describe the bug I'm using the Chinese Simplified version of Windows 10 with "Beta: Use Unicode UTF-8 for worldwide language support" enabled. Help text in java is still encoded with GBK, so it cannot be displayed correctly. Other texts shown by java.exe are also affected. I don't know if this happens for other languages or not.

I also tryed AdoptOpenJDK, and it also have this problem. So maybe this is an upstream bug? Where should i report this?

Thanks

Steps to reproduce the behavior:

  1. Windows 10, Chinese Simplified
  2. Enable "Beta: Use Unicode UTF-8 for worldwide language support" in control panel
  3. java --help
  4. See error

Expected behavior Help text is shown correctly.

Screenshots screenshot

output of java --help, help text encoded with GBK while the system is using UTF-8 java-help.txt

everything411 avatar May 16 '21 08:05 everything411

Hi @everything411 - If this is also occurring with AdoptOpenJDK then the likely issue is with OpenJDK itself (or some common configuration that you need to set). We'll see if we can reproduce and advise on next steps (or submit an upstream issue on your behalf).

karianna avatar May 20 '21 22:05 karianna

@gdams could you take a look into this please and verify if this also occurs on Adoptium?

brunoborges avatar Sep 13 '21 17:09 brunoborges

@gdams could you take a look into this please and verify if this also occurs on Adoptium?

fixed in latest Adoptium JDK17-beta. still wrong encoding for Adoptium JDK16 JDK11 and JDK8 jdk17 jdk11 jdk16

everything411 avatar Sep 14 '21 01:09 everything411

@everything411 could you please check if this happens with the MS Build of OpenJDK binaries? Which versions the problem appears, which don't?

brunoborges avatar Oct 07 '21 18:10 brunoborges

@brunoborges

MS Build of OpenJDK 17: the problem don't appear MS Build of OpenJDK 11: the problem appears

so it seems that this problem is fixed in upstream jdk 17 but not in other versions of jdk?

everything411 avatar Oct 08 '21 01:10 everything411

@everything411 thanks for testing! If you don't mind one final question: is there an OpenJDK 11 build that you've seen that doesn't has this problem?

Maybe Zulu, or Oracle JDK?

brunoborges avatar Oct 08 '21 01:10 brunoborges

Hi @everything411 @cyhhao

Could you please check if this issue is still happening with the packages published at microsoft.com/openjdk ?

brunoborges avatar May 03 '22 21:05 brunoborges

@brunoborges

> chcp
Active code page: 65001
java11
> java --version
openjdk 11.0.14.1 2022-02-08 LTS
OpenJDK Runtime Environment Microsoft-31205 (build 11.0.14.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-31205 (build 11.0.14.1+1-LTS, mixed mode)
java17
> java --version
openjdk 17.0.3 2022-04-19 LTS
OpenJDK Runtime Environment Microsoft-32931 (build 17.0.3+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-32931 (build 17.0.3+7-LTS, mixed mode, sharing)

the same result as before, ok for jdk17 and bad encoding for jdk11.

i also notice that javac's help text still broken in both jdk11 and jdk17. these texts are GBK-encoded and then printed to the UTF-8 console, leading to these "�"

> javac
�÷�: javac <options> <source files>
����, ���ܵ�ѡ�����:
  @<filename>                  ���ļ���ȡѡ����ļ���
  -Akey[=value]                ���ݸ�ע�ʹ�������ѡ��
  --add-modules <�>(,<�>)*
        ���˳�ʼģ��֮��Ҫ�����ĸ�ģ��; ��� <module>
                Ϊ ALL-MODULE-PATH, ��Ϊģ��·���е�����ģ�顣
  --boot-class-path <path>, -bootclasspath <path>
        �����������ļ���λ��

encoding of compiling error texts are bad, too, GBK-encoded text printed to UTF-8 console

> java .\test.java
.\test.java:7: ����: δ������쳣����FileNotFoundException; ���������в���������Ա��׳�
                InputStreamReader fileReader = new InputStreamReader(new FileInputStream(new File("not exist")), StandardCharsets.UTF_8);
                                                                     ^
1 ������
错误: 编译失败

and i also find that runtime exception texts encoding for jdk11 is ok but for jdk17 it is bad

for jdk11 "系统找不到指定的文件" means "No such file or directory" in english

> java .\test.java
Exception in thread "main" java.io.FileNotFoundException: not exist (系统找不到指定的文件)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

for jdk17, "绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�" is meaningless, and it seems that "绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�" is the text "系统找不到指定的文件" encoded in UTF-8 is decoded as GBK, and then the GBK-decoded text is encoded in UTF-8 and printed to the UTF-8 console

> java .\test.java
Exception in thread "main" java.io.FileNotFoundException: not exist (绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

everything411 avatar May 05 '22 13:05 everything411

output of java.exe -XshowSettings:properties -version for jdk11

> java.exe -XshowSettings:properties -version
Property settings:
    awt.toolkit = sun.awt.windows.WToolkit
    file.encoding = GBK
    file.separator = \
    java.awt.graphicsenv = sun.awt.Win32GraphicsEnvironment
    java.awt.printerjob = sun.awt.windows.WPrinterJob
    java.class.path =
    java.class.version = 55.0
    java.home = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 11.0.14.1+1-LTS
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 11
    java.vendor = Microsoft
    java.vendor.url = https://www.microsoft.com
    java.vendor.url.bug = https://github.com/microsoft/openjdk/issues
    java.vendor.version = Microsoft-31205
    java.version = 11.0.14.1
    java.version.date = 2022-02-08
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 11
    java.vm.vendor = Microsoft
    java.vm.version = 11.0.14.1+1-LTS
    jdk.debug = release
    line.separator = \r \n
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Microsoft\jdk-11.0.14.101-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.desktop = windows
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = cp65001
    sun.stdout.encoding = cp65001
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.timezone =
    user.variant =

openjdk version "11.0.14.1" 2022-02-08 LTS
OpenJDK Runtime Environment Microsoft-31205 (build 11.0.14.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-31205 (build 11.0.14.1+1-LTS, mixed mode)

output of java.exe -XshowSettings:properties -version for jdk17

    file.encoding = GBK
    file.separator = \
    java.class.path =
    java.class.version = 61.0
    java.home = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 17.0.3+7-LTS
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 17
    java.vendor = Microsoft
    java.vendor.url = https://www.microsoft.com
    java.vendor.url.bug = https://github.com/microsoft/openjdk/issues
    java.vendor.version = Microsoft-32931
    java.version = 17.0.3
    java.version.date = 2022-04-19
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode, sharing
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 17
    java.vm.vendor = Microsoft
    java.vm.version = 17.0.3+7-LTS
    jdk.debug = release
    line.separator = \r \n
    native.encoding = GBK
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Microsoft\jdk-17.0.3.7-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = UTF-8
    sun.stdout.encoding = UTF-8
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.variant =

openjdk version "17.0.3" 2022-04-19 LTS
OpenJDK Runtime Environment Microsoft-32931 (build 17.0.3+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-32931 (build 17.0.3+7-LTS, mixed mode, sharing)

everything411 avatar May 05 '22 14:05 everything411

i tried Temurin JDK 18 and java and javac is ok.

> java.exe
用法:java [options] <主类> [args...]
           (执行类)
   或  java [options] -jar <jar 文件> [args...]
           (执行 jar 文件)
   或  java [options] -m <模块>[/<主类>] [args...]
       java [options] --module <模块>[/<主类>] [args...]
           (执行模块中的主类)
   或  java [options] <源文件> [args]
           (执行单个源文件程序)

> javac.exe
用法: javac <options> <source files>
其中, 可能的选项包括:
  @<filename>                  从文件读取选项和文件名
  -Akey[=value]                传递给注释处理程序的选项
  --add-modules <模块>(,<模块>)*
        除了初始模块之外要解析的根模块; 如果 <module>
                为 ALL-MODULE-PATH, 则为模块路径中的所有模块。
  --boot-class-path <path>, -bootclasspath <path>
        覆盖引导类文件的位置

> java.exe" .\test.java
.\test.java:5: 错误: 未报告的异常错误FileNotFoundException; 必须对其进行捕获或声明以便抛出
                InputStreamReader fileReader = new InputStreamReader(new FileInputStream(new File("not exist")), StandardCharsets.UTF_8);
                                                                     ^
1 个错误
错误: 编译失败

However, runtime exception texts are still bad, the same problem as jdk17

Exception in thread "main" java.io.FileNotFoundException: not exist (绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆�)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at Test.main(test.java:5)

output of java.exe -XshowSettings:properties -version for jdk18

Property settings:
    file.encoding = UTF-8
    file.separator = \
    java.class.path =
    java.class.version = 62.0
    java.home = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot
    java.io.tmpdir = C:\Users\EVERYT~1\AppData\Local\Temp\
    java.library.path = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot\bin
(omitted)
        .
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 18.0.1+10
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 18
    java.vendor = Eclipse Adoptium
    java.vendor.url = https://adoptium.net/
    java.vendor.url.bug = https://github.com/adoptium/adoptium-support/issues
    java.vendor.version = Temurin-18.0.1+10
    java.version = 18.0.1
    java.version.date = 2022-04-19
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode, sharing
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 18
    java.vm.vendor = Eclipse Adoptium
    java.vm.version = 18.0.1+10
    jdk.debug = release
    line.separator = \r \n
    native.encoding = GBK
    os.arch = amd64
    os.name = Windows 11
    os.version = 10.0
    path.separator = ;
    sun.arch.data.model = 64
    sun.boot.library.path = C:\Program Files\Eclipse Adoptium\jdk-18.0.1.10-hotspot\bin
    sun.cpu.endian = little
    sun.cpu.isalist = amd64
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = GBK
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.os.patch.level =
    sun.stderr.encoding = UTF-8
    sun.stdout.encoding = UTF-8
    user.country = CN
    user.dir = C:\Users\everything411
    user.home = C:\Users\everything411
    user.language = zh
    user.name = everything411
    user.script =
    user.variant =

openjdk version "18.0.1" 2022-04-19
OpenJDK Runtime Environment Temurin-18.0.1+10 (build 18.0.1+10)
OpenJDK 64-Bit Server VM Temurin-18.0.1+10 (build 18.0.1+10, mixed mode, sharing)

everything411 avatar May 05 '22 14:05 everything411

Digression: Why do you want to enable this beta utf-8 option?

imba-tjd avatar May 06 '22 01:05 imba-tjd

@imba-tjd linux and macos both set the default encoding to utf8. i need to share source codes with chinese characters between my windows machine and linux machine (wsl1 and wsl2 use utf-8, too).

everything411 avatar May 06 '22 05:05 everything411

Did you tried to input Chinese from stdin? Try this

System.out.println(new Scanner(System.in).nextLine());

You will find that it fails to read, if you enabled the beta utf8.

imba-tjd avatar May 06 '22 05:05 imba-tjd

I have also noticed this bug before. infact it not only affects java, but for C scanf, C++ cin, C# Console.Readline, they all don't accept chinese when utf8 enabled.

I believe that this is a windows console related bug instead of the language runtime. see https://docs.microsoft.com/zh-cn/windows/console/classic-vs-vt and https://github.com/microsoft/terminal/issues/7777

everything411 avatar May 06 '22 09:05 everything411

The issue https://bugs.openjdk.org/browse/JDK-8272352 might be relevant here; it was backported to OpenJDK 11.0.17 and Java 17.0.5 quite recently.

(Just passing by... I saw this thread as I was fixing Unicode problems in the NetBeans IDE.)

eirikbakke avatar Nov 29 '22 17:11 eirikbakke

@everything411 Are you able to try with our latest 17.0.5 build? As @eirikbakke mentions, the upstream issue seems to be fixed.

karianna avatar Nov 29 '22 22:11 karianna

@karianna I can confirm that all bugs I reported here no longer exist now.

everything411 avatar Nov 30 '22 01:11 everything411