prism icon indicating copy to clipboard operation
prism copied to clipboard

Out-of-bounds read after utf-8 BOM

Open stevenjohnstone opened this issue 1 month ago • 0 comments

On 1da0733f147ed0a4547791576db53c04780cd498, compile & run the following to see an out-of-bounds read:

#include <prism.h>


// utf-8 BOM
uint8_t input[] = {
    0xEF, 0xBB, 0xBF
};


int main(int argc, const char **argv) {
    (void)argc;
    (void)argv;
    pm_parse_success_p(input, sizeof(input), NULL);
    return 0;
}
$ clang -Iinclude $(find src -name "*.c") -ggdb3 testcase.c -fsanitize=address -o testcase
$ ./testcase
=================================================================
==94==ERROR: AddressSanitizer: global-buffer-overflow on address 0xaaaac723a4e3 at pc 0xaaaac7070bac bp 0xffffffb5bcb0 sp 0xffffffb5bca8
READ of size 1 at 0xaaaac723a4e3 thread T0
    #0 0xaaaac7070ba8 in pm_parser_init /prism/src/prism.c:22867:23
    #1 0xaaaac7074180 in pm_parse_success_p /prism/src/prism.c:23130:5
    #2 0xaaaac717e434 in main /prism/testcase.c:13:5
    #3 0xffff8bf873fc in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0xffff8bf874d4 in __libc_start_main csu/../csu/libc-start.c:392:3
    #5 0xaaaac6f2302c in _start (/prism/testcase+0x5302c) (BuildId: 25a9e93a18b23689a7b2d9a3dd3aaa1dea544223)

0xaaaac723a4e3 is located 0 bytes after global variable 'input' defined in '/prism/testcase.c:5' (0xaaaac723a4e0) of size 3
SUMMARY: AddressSanitizer: global-buffer-overflow /prism/src/prism.c:22867:23 in pm_parser_init
Shadow bytes around the buggy address:
  0xaaaac723a200: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0xaaaac723a280: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0xaaaac723a300: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0xaaaac723a380: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0xaaaac723a400: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
=>0xaaaac723a480: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9[03]f9 f9 f9
  0xaaaac723a500: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0xaaaac723a580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0xaaaac723a600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0xaaaac723a680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0xaaaac723a700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==94==ABORTING

Tiny issue but fuzzing keeps running into it.

I'm using the following patch locally to work around:

diff --git a/src/prism.c b/src/prism.c
index c419fc9e..52e46aea 100644
--- a/src/prism.c
+++ b/src/prism.c
@@ -22860,8 +22860,8 @@ pm_parser_init(pm_parser_t *parser, const uint8_t *source, size_t size, const pm
     // If the shebang does not include "ruby" and this is the main script being
     // parsed, then we will start searching the file for a shebang that does
     // contain "ruby" as if -x were passed on the command line.
-    const uint8_t *newline = next_newline(parser->start, parser->end - parser->start);
-    size_t length = (size_t) ((newline != NULL ? newline : parser->end) - parser->start);
+    const uint8_t *newline = next_newline(parser->current.end, parser->end - parser->current.end);
+    size_t length = (size_t) ((newline != NULL ? newline : parser->end) - parser->current.end);

     if (length > 2 && parser->current.end[0] == '#' && parser->current.end[1] == '!') {
         const char *engine;

stevenjohnstone avatar Nov 25 '25 18:11 stevenjohnstone