jerryscript icon indicating copy to clipboard operation
jerryscript copied to clipboard

Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371.

Open FlydragonTy opened this issue 4 years ago • 3 comments

JerryScript revision

Commit: a6ab5e9

Version: v3.0.0

Build platform

Ubuntu 18.04.5 LTS (Linux 4.19.128-microsoft-standard x86_64)

Ubuntu 18.04.5 LTS (Linux 5.4.0-44-generic x86_64)

Build steps
python ./tools/build.py --clean --debug --compile-flag=-fsanitize=address --compile-flag=-m32 --compile-flag=-g --strip=off --lto=off --logging=on --line-info=on --error-message=on --system-allocator=on --stack-limit=20
Test case

poc-as.txt

Execution steps & Output
$ ./jerryscript/build/bin/jerry poc.js

ICE: Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371.
Error: ERR_FAILED_INTERNAL_ASSERTION
[1]    abort      jerry poc.js

Credits: Found by OWL337 team.

FlydragonTy avatar Jan 04 '22 06:01 FlydragonTy

@rerobika I think it is not a bug, but a feature. "𞸋" is encoded in UTF-8 as 0xF09EB88B which is invaliid in CESU8. But of course we could raise a user friendly error message instead of assertion.

ossy-szeged avatar Jan 10 '22 12:01 ossy-szeged

The issue is not with the "𞸋" character, all non-BMP characters are converted to cesu8 encoding during parsing. The problem is that the first character is in the basic multilingual plane and should be encoded using 3 bytes, however it is encoded using 4 bytes in the input. This messes up the conversion logic, which always expects the cesu8 equivalent to be 6 bytes long.

dbatyai avatar Jan 10 '22 13:01 dbatyai

+info, a simple /*𝔽*/ string fails with the same error if we build with tools/build.py --debug --function-to-string=on

ossy-szeged avatar Jan 11 '22 12:01 ossy-szeged