spring-cloud-dataflow icon indicating copy to clipboard operation
spring-cloud-dataflow copied to clipboard

Chinese and Japanese garbled characters appear

Open yuhuyuyu opened this issue 3 years ago • 1 comments

Description: Create a stream containing a script application. If the script contains Japanese or Chinese, some errors will occur.

  1. SQLException occurs when deploying This is because the encoding method of the date field of skipper_manifest and the pkg_json_string field of skipper_release in skipper database is latin1. So I changed the encoding of the field to utf8, which solved the problem.
  2. The problem of garbled characters After I solved problem 1, I checked skipper_release and skipper_manifest, and found that both Japanese or Chinese are in the table? ? ? (garbled state). I still don't know how to solve this problem.

Release versions: spring cloud dataflow 2.9.0 script application 3.2.1

Steps to reproduce:

http | script --language=python --script=payload=payload+'嗨你好世界' | log

Screenshots: Where applicable, add screenshots to help explain your problem. image

yuhuyuyu avatar Aug 01 '22 11:08 yuhuyuyu

@onobc This seems similar to a recent issue you dealt with.

corneil avatar Aug 04 '22 11:08 corneil

After setting Database encoding to UTF-8 I was able to save the DSL. Set dataflow and skipper container env LANG=en_US.UTF-8 and LC_ALL=en_US.UTF-8 to as well as JDK_JAVA_OPTIONS=-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 also added to skipper platform environmentVariables. After changes the applications has the following as the first entry in the logs:

NOTE: Picked up JDK_JAVA_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8

Using: kubectl exec kafka-broker-0 -- /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --whitelist ".*" to log messages.

Implemented http | log and sent a message containing japanese characters. The message is logged by log-sink and kafka-console-consumer with the expected values.

Implemented http | transform | log with app.transform.spel.function.expression=payload+'嗨你好世界' the kanji after payload is replaced with ���������������while any kanji in the message is preserved. Implementedhttp | script | logand same expression for script. Same happened with groovy and python. The logs show:Input script is 'payload+'���������������'', language is 'python'` Possible problems:

  • Java default encoding isn't modified to UTF-8 after these settings.
  • The handling of the property for expression or script is not using UTF-8 encoding as configured.
  • Transform and Script functions aren't using UTF-8 encoding as configured.

Python script doesn't execute after the encoding changes:

Caused by: javax.script.ScriptException: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 4: ordinal not in range(128) in <script> at line number 1
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.jsr223.PyScriptEngine.scriptException(PyScriptEngine.java:222)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:59)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:31)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at java.scripting/javax.script.AbstractScriptEngine.eval(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.springframework.integration.scripting.jsr223.AbstractScriptExecutor.executeScript(AbstractScriptExecutor.java:84)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	... 45 more
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: Caused by: Traceback (most recent call last):
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]:   File "<script>", line 1, in <module>
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 4: ordinal not in range(128)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.strict_errors(codecs.java:204)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.JavaFunc.__call__(Py.java:2895)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyObject.__call__(PyObject.java:433)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.decoding_error(codecs.java:1603)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.insertReplacementAndGetResume(codecs.java:1572)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.PyUnicode_DecodeIntLimited(codecs.java:1161)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.PyUnicode_DecodeASCII(codecs.java:1144)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.codecs.decode(codecs.java:92)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyString.decode(PyString.java:4015)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyString.decode(PyString.java:4007)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyUnicode.coerceToStringOrNull(PyUnicode.java:1012)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyUnicode.unicode___add__(PyUnicode.java:1172)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyUnicode.__add__(PyUnicode.java:1166)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyObject._basic_add(PyObject.java:2083)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyObject._add(PyObject.java:2068)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.pycode._pyx2.f$0(<script>:1)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.pycode._pyx2.call_function(<script>)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyTableCode.call(PyTableCode.java:173)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.PyCode.call(PyCode.java:18)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.Py.runCode(Py.java:1687)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.__builtin__.eval(__builtin__.java:497)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.core.__builtin__.eval(__builtin__.java:501)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.util.PythonInterpreter.eval(PythonInterpreter.java:255)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:57)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: 	... 48 more

corneil avatar Oct 20 '22 13:10 corneil

The root cause of this lies in the baseimage used by Spring Cloud Stream Applications. I propose updating the build process to use packeto buildpacks like Spring Boot build image and existing Spring Cloud Data Flow builds. We can publish images for Java 8, 11 and 17 and tag 11 as the default for the version like we do with SCDF.

corneil avatar Oct 24 '22 10:10 corneil

The snapshot images has been created with Packeto Buildpacks like Spring Boot images. They can be registered with https://repo.spring.io/snapshot/org/springframework/cloud/stream/app/stream-applications-descriptor/2021.1.3-SNAPSHOT/stream-applications-descriptor-2021.1.3-SNAPSHOT.stream-apps-kafka-docker

The application versions are 3.2.2-SNAPSHOT

The default encoding for the container and JVM have been set using to en_US.UTF-8 using environmental variables LANG=en_US.utf8, LC_ALL=en_US.utf8 and JDK_JAVA_OPTIONS=-Dfile.encoding=UTF-8 -Dsun.jnu.encoding

Users will only need to update LANG or LC_ALL if they need a different locale.

corneil avatar Oct 27 '22 21:10 corneil

Fixed in SNAPSHOT builds.

corneil avatar Nov 04 '22 10:11 corneil