codebrowser icon indicating copy to clipboard operation
codebrowser copied to clipboard

Ignore unsupported UTF-8 characters

Open DimitriFourny opened this issue 11 months ago • 4 comments

If a file contains an unsupported UTF-8 character, it will break the full runner.py script. Ignoring the unwanted characters seems to be the best solution.

DimitriFourny avatar Feb 20 '25 09:02 DimitriFourny

what is the error?

Waqar144 avatar Feb 20 '25 09:02 Waqar144

I don't remember the exact byte value and position but it was:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 747: invalid start byte

DimitriFourny avatar Feb 20 '25 15:02 DimitriFourny

Do you have some minimal code with which I can reproduce the issue?

Waqar144 avatar Feb 21 '25 08:02 Waqar144

Unfortunately no, but I was using the codebrowser on Chromium source code. Just putting invalid UTF-8 value in one of the generated file will reproduce the issue in fact. It was in the /refs directory.

DimitriFourny avatar Feb 21 '25 09:02 DimitriFourny