CompCert icon indicating copy to clipboard operation
CompCert copied to clipboard

Attacks by introducing invisible source code using special Unicode characters

Open MSoegtropIMC opened this issue 4 years ago • 3 comments

In this paper: (https://www.trojansource.codes/trojan-source.pdf) researchers from the University of Cambridge describe an attack which uses special Unicode characters to inject invisible or modified code. As far as I can tell any compiler taking Unicode input is affected by this. I am not sure if CompCert accepts (full) Unicode input - I just guess it does. But as ultra high reliability compiler, it should be checked if CompCert has this vulnerability.

There is an easy remedy, which is rejecting sources which contain such characters.

MSoegtropIMC avatar Nov 04 '21 08:11 MSoegtropIMC

I'm aware of this paper. One could argue that the problem is more with the text editor used to display and review code than with the compiler. But it probably makes sense to harden the compiler anyway.

For CompCert, it also means clarifying the support for UTF-8 in character and string literals. Currently, CompCert treats everything as sequences of bytes, even in wide literals, and makes no attempts to validate or give meaning to UTF-8 sequences.

xavierleroy avatar Nov 04 '21 10:11 xavierleroy

The issue I see is that in industrial environments one might have a bit exotic source review process tools, where it might take a longer time to fix this or even be overlooked.

MSoegtropIMC avatar Nov 04 '21 12:11 MSoegtropIMC

GCC is impacted by this, which means CompCert is also impacted by this. (I confirmed with examples from the "trojan source" repo.)

However, I do not believe this is a serious vulnerability deserving of additional mitigations at this time:

  1. Programs verified with VST are not effected.
  2. GitHub added a detector for these shenanigans right away.
  3. This is a good problem for a linter.
  4. There are easier ways to sneak a "bugdoor" into a C program.

image

intoverflow avatar Nov 16 '21 05:11 intoverflow

This issue was discussed among the CompCert developers a while ago, but I forgot to transcribe the discussion here. In summary:

  • We're unconvinced that the attack is serious enough to deserve specific treatment in the compiler. Note that some code reviewing tools such as Github's already warn about strange uses of bidirectional markers in sources.
  • CompCert calls an external preprocessor to read and preprocess C source files. If the preprocessor happens to detect and report strange uses of bidirectional markers, CompCert benefits too. GCC 12 does it very well (see https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12). So, if you use CompCert with GCC 12 as the preprocessor, you're protected. Let's hope that Clang will follow in this direction.
  • If the preprocessor doesn't analyze "bidi" markers, CompCert is in a bad position to do it itself, because preprocessing generally strips comments off, so the CompCert lexer would miss "bidi" attacks that reside in comments. Or, it would need to duplicate some of the preprocessor's work (to chase include files for instance) so as to analyze source code before preprocessing.
  • Finally, with the soon-to-be-released CompCert 3.12, you can pass -finput-charset options to the preprocessor. With GCC as the preprocessor, -finput-charset=ascii rejects all non-ASCII characters, including "bidi" markers. That's harsh but could make sense for critical embedded software.

For all these reasons, we think that no change to CompCert itself is required. So, I'll close this issue.

xavierleroy avatar Nov 11 '22 16:11 xavierleroy

Thanks for the detailed analysis! The -finput-charset=ascii option is sufficient for our purposes.

The consequences of the article you linked to CompCert are not fully clear to me, though. Do I need to pass an option to CompCert to enable the new warnings introduced in GCC 12 or is using GCC 12 for preprocessing sufficient?

MSoegtropIMC avatar Nov 11 '22 17:11 MSoegtropIMC

From my limited experiments with GCC 12, the "bidi attack" warning is on by default, so you should get it as soon as you use GCC 12 for preprocessing. Turning the warning into an error will take a bit more work,e.g. something like -Wp,-Werror.

xavierleroy avatar Nov 11 '22 17:11 xavierleroy