Chinese and Japanese garbled characters appear
Description: Create a stream containing a script application. If the script contains Japanese or Chinese, some errors will occur.
- SQLException occurs when deploying This is because the encoding method of the date field of skipper_manifest and the pkg_json_string field of skipper_release in skipper database is latin1. So I changed the encoding of the field to utf8, which solved the problem.
- The problem of garbled characters After I solved problem 1, I checked skipper_release and skipper_manifest, and found that both Japanese or Chinese are in the table? ? ? (garbled state). I still don't know how to solve this problem.
Release versions: spring cloud dataflow 2.9.0 script application 3.2.1
Steps to reproduce:
http | script --language=python --script=payload=payload+'嗨你好世界' | log
Screenshots:
Where applicable, add screenshots to help explain your problem.

@onobc This seems similar to a recent issue you dealt with.
After setting Database encoding to UTF-8 I was able to save the DSL.
Set dataflow and skipper container env LANG=en_US.UTF-8 and LC_ALL=en_US.UTF-8 to as well as JDK_JAVA_OPTIONS=-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 also added to skipper platform environmentVariables.
After changes the applications has the following as the first entry in the logs:
NOTE: Picked up JDK_JAVA_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
Using: kubectl exec kafka-broker-0 -- /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --whitelist ".*" to log messages.
Implemented http | log and sent a message containing japanese characters. The message is logged by log-sink and kafka-console-consumer with the expected values.
Implemented http | transform | log with app.transform.spel.function.expression=payload+'嗨你好世界' the kanji after payload is replaced with ���������������while any kanji in the message is preserved. Implementedhttp | script | logand same expression for script. Same happened with groovy and python. The logs show:Input script is 'payload+'���������������'', language is 'python'`
Possible problems:
- Java default encoding isn't modified to UTF-8 after these settings.
- The handling of the property for expression or script is not using UTF-8 encoding as configured.
- Transform and Script functions aren't using UTF-8 encoding as configured.
Python script doesn't execute after the encoding changes:
Caused by: javax.script.ScriptException: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 4: ordinal not in range(128) in <script> at line number 1
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.jsr223.PyScriptEngine.scriptException(PyScriptEngine.java:222)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:59)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:31)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at java.scripting/javax.script.AbstractScriptEngine.eval(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.springframework.integration.scripting.jsr223.AbstractScriptExecutor.executeScript(AbstractScriptExecutor.java:84)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: ... 45 more
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: Caused by: Traceback (most recent call last):
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: File "<script>", line 1, in <module>
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 4: ordinal not in range(128)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]:
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.strict_errors(codecs.java:204)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at java.base/java.lang.reflect.Method.invoke(Unknown Source)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.JavaFunc.__call__(Py.java:2895)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyObject.__call__(PyObject.java:433)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.decoding_error(codecs.java:1603)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.insertReplacementAndGetResume(codecs.java:1572)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.PyUnicode_DecodeIntLimited(codecs.java:1161)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.PyUnicode_DecodeASCII(codecs.java:1144)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.codecs.decode(codecs.java:92)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyString.decode(PyString.java:4015)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyString.decode(PyString.java:4007)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyUnicode.coerceToStringOrNull(PyUnicode.java:1012)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyUnicode.unicode___add__(PyUnicode.java:1172)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyUnicode.__add__(PyUnicode.java:1166)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyObject._basic_add(PyObject.java:2083)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyObject._add(PyObject.java:2068)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.pycode._pyx2.f$0(<script>:1)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.pycode._pyx2.call_function(<script>)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyTableCode.call(PyTableCode.java:173)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.PyCode.call(PyCode.java:18)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.Py.runCode(Py.java:1687)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.__builtin__.eval(__builtin__.java:497)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.core.__builtin__.eval(__builtin__.java:501)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.util.PythonInterpreter.eval(PythonInterpreter.java:255)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: at org.python.jsr223.PyScriptEngine.eval(PyScriptEngine.java:57)
default/script-test-script-v4-d78955d56-rjq6k[script-test-script-v4]: ... 48 more
The root cause of this lies in the baseimage used by Spring Cloud Stream Applications. I propose updating the build process to use packeto buildpacks like Spring Boot build image and existing Spring Cloud Data Flow builds. We can publish images for Java 8, 11 and 17 and tag 11 as the default for the version like we do with SCDF.
The snapshot images has been created with Packeto Buildpacks like Spring Boot images. They can be registered with https://repo.spring.io/snapshot/org/springframework/cloud/stream/app/stream-applications-descriptor/2021.1.3-SNAPSHOT/stream-applications-descriptor-2021.1.3-SNAPSHOT.stream-apps-kafka-docker
The application versions are 3.2.2-SNAPSHOT
The default encoding for the container and JVM have been set using to en_US.UTF-8 using environmental variables LANG=en_US.utf8, LC_ALL=en_US.utf8 and JDK_JAVA_OPTIONS=-Dfile.encoding=UTF-8 -Dsun.jnu.encoding
Users will only need to update LANG or LC_ALL if they need a different locale.
Fixed in SNAPSHOT builds.