cppcheck icon indicating copy to clipboard operation
cppcheck copied to clipboard

Optimize matchcompiler: create ConstStrings once

Open rikardfalkeborn opened this issue 3 years ago • 19 comments

Reduces IR with ~2.5%.

rikardfalkeborn avatar Aug 02 '22 22:08 rikardfalkeborn

This looks really interesting. Could you also provide a small diff example of the generated code?

firewave avatar Aug 03 '22 06:08 firewave

I did a small testrun on mame_regtest with DISABLE_VALUEFLOW=1 and --enable=all --inconclusive on Clang 13 and the Ir actually increased:

2,472,706,153-> 2,500,915,833

I guess you used GCC.

Digging into it (via findLambdaEndToken()) I see operator StringRef() which wasn't there before and more spend within equalN(). So I guess the const object prevents some better inlining which was happening before.

I dabbled in getting rid of some of the indirections in MatchCompiler::makeConstString() in the past and it also have mixed results with being less performant although there's less code. That's when I hit another case where the compilers are behaving quite differently.

I tried constexpr on ConstString on top of your changes and that reduced the Ir to 2,468,887,538. The operator StringRef() calls are now gone. I still see slight increases but they are hard to track. It seems they are related to std::string:find*() now being more expensive!? I remember coming across this before - I think it was when we moved from Clang 13 to Clang 14.

Here's the diff:

diff --git a/lib/matchcompiler.h b/lib/matchcompiler.h
--- a/lib/matchcompiler.h	(revision 46984ae45eee127ce52730540a3583454ee315d1)
+++ b/lib/matchcompiler.h	(date 1659509930132)
@@ -28,7 +28,7 @@
     class ConstString {
     public:
         typedef const char(&StringRef)[n];
-        explicit ConstString(StringRef s)
+        explicit constexpr ConstString(StringRef s)
             : _s(s) {}
 
         operator StringRef() const {
@@ -64,7 +64,7 @@
     }
 
     template<unsigned int n>
-    inline ConstString<n> makeConstString(const char (&s)[n])
+    inline constexpr ConstString<n> makeConstString(const char (&s)[n])
     {
         return ConstString<n>(s);
     }
diff --git a/tools/matchcompiler.py b/tools/matchcompiler.py
--- a/tools/matchcompiler.py	(revision 46984ae45eee127ce52730540a3583454ee315d1)
+++ b/tools/matchcompiler.py	(date 1659509556848)
@@ -721,7 +721,7 @@
 
         constants = u''
         for constString, constStringId in self._constStringCache.items():
-            constants += 'static const auto '  + constStringId + ' = MatchCompiler::makeConstString("' + constString + '");\n'
+            constants += 'static constexpr auto '  + constStringId + ' = MatchCompiler::makeConstString("' + constString + '");\n'
 
         lineno = u''
         if line_directive:

And dropping your changes with only ConstString supporting constexpr I got 2,472,695,908. If I dig into it there's part which slightly regress and other which improve.

I will give GCC a spin later today.

firewave avatar Aug 03 '22 07:08 firewave

In checkunusedfunctions.cpp the code creates an object and replaces it for

// Taking care of funcName == "operator", which is no valid operator

We at least need to skip lines which start with //. This would also fix such a case in checktype.cpp.

In symboldatabase.cpp there's one where the comment starts within the line which would be harder to support.

All of those generate -Wunused-const-variable compiler warnings with Clang.

I came across this in other MatchCompiler replacements but somehow didn't fix it yet.

firewave avatar Aug 03 '22 07:08 firewave

GCC 12 2,395,429,828 - main 2,421,359,726 - this PR 2,395,496,634 - this PR + constexpr ConstString + static constexpr 2,395,496,606 - this PR + constexpr ConstString 2,395,429,826 - constexpr ConstString

Interestingly with your patch it uses more Ir with GCC as well - what toolchain did you use? I am aware less instructions is not automatically better. There's better but more complicated ways to measure it than callgrind.

This looks remarkedly similar to what I experienced while working on #3725.

Here's the overview for Clang 13: 2,472,706,153 - main 2,500,915,833 - this PR 2,468,887,538 - this PR + constexpr ConstString + static constexpr 2,468,875,670 - this PR + constexpr ConstString 2,472,695,908 - constexpr ConstString

This obviously needs a bigger testrun.

firewave avatar Aug 03 '22 08:08 firewave

This is why I don't like optimizations :)

I'm using gcc 12.1.1, valgrind 3.19.0, linux. I did the measurements using callgrind when running cppcheck on main.cpp in simplecpp (I used --enable=all but if I recall correctly, that didn't matter).

I didn't notice the unused variable warning, I guess we could add a check for comments in MatchCompiler.

Another thing I noticed was that make didn't realize it needed to recompile after changes in matchcompiler.h, but that's a separate issue.

Finally, I'm running this in VirtualBox on a very old computer but I don't think that should matter for IR?

rikardfalkeborn avatar Aug 03 '22 08:08 rikardfalkeborn

I also tested with constexpr, and gcc didn't seem to care about it, so I didn't add it. I'll see if I can reproduce the result with value flow disabled tonight.

rikardfalkeborn avatar Aug 03 '22 08:08 rikardfalkeborn

I'm using gcc 12.1.1, valgrind 3.19.0, linux. I did the measurements using callgrind when running cppcheck on main.cpp in simplecpp (I used --enable=all but if I recall correctly, that didn't matter).

mame_regtest is C only. I disable the valueflow for my tests since it slows things down to bad it's kinds unusable even outside of callgrind (but optimizing the actual core isn't bad either). main.cpp of simplecpp doesn't have much code at all - you should be using simplecpp.cpp instead - or better a compilation database of simplecpp.

Another thing I noticed was that make didn't realize it needed to recompile after changes in matchcompiler.h, but that's a separate issue.

Funny - I was wondering if CMake is correctly picking it up - and it does. Didn't think the Makefile didn't - but since it only applies to MATCHCOMPILER=1 builds that makes sense. Will take a look.

Finally, I'm running this in VirtualBox on a very old computer but I don't think that should matter for IR?

Not that I am aware of. perf is apparently much better since it counts the actual cycles used but I have zero experience with that and it requires kernel support so me using WSL1 (which doesn't have a kernel) cannot utilize it. llvm-mca looks at the cost of the actual instructions but that requires an actual code sample. I only used it once so I am not too familiar with it either.

firewave avatar Aug 03 '22 11:08 firewave

This looks really interesting. Could you also provide a small diff example of the generated code?

So given the following silly example file:

bool isF(const Scope *s)
{
        for (const Token* tok = scope->bodyStart; tok != scope->bodyEnd; tok = tok->next()) {
                if (tok->str() == "f")
                        return true;
        }
}

Previously, MatchCompiler generated this:

#include "matchcompiler.h"
#include <string>
#include <cstring>
bool isF(const Scope *s)
{
        for (const Token* tok = scope->bodyStart; tok != scope->bodyEnd; tok = tok->next()) {
                if (tok->str() == MatchCompiler::makeConstString("f"))
                        return true;
        }
}

With these changes, this is generated:

#include "matchcompiler.h"
#include <string>
#include <cstring>
static const auto constString0 = MatchCompiler::makeConstString("f");
bool isF(const Scope *s)
{
        for (const Token* tok = scope->bodyStart; tok != scope->bodyEnd; tok = tok->next()) {
                if (tok->str() == constString0)
                        return true;
        }
}

rikardfalkeborn avatar Aug 03 '22 21:08 rikardfalkeborn

So, my results with clang-14 (still using main.cpp in simplecpp) is the improvements with this PR improves IR with clang too (without the constexpr), but I guess I should try a bigger file (but it takes forever with callgrind). The improvements are actually bigger with clang compared to gcc, but this clearly needs more investigation...

rikardfalkeborn avatar Aug 03 '22 23:08 rikardfalkeborn

Another thing I noticed was that make didn't realize it needed to recompile after changes in matchcompiler.h, but that's a separate issue.

I filed https://trac.cppcheck.net/ticket/11241 with a suggestion on how to fix it.

firewave avatar Aug 06 '22 11:08 firewave

I did some tests with godbolt and llvm-mca and none of the variations have any effect the code at all in a very simple test case. So it seems to be related to more complex functions.

Here's a link to play around with: https://godbolt.org/z/K9nvqrzax.

firewave avatar Sep 24 '22 16:09 firewave

We now have a callgrind step in the selfcheck so you get information about the impact. So please rebase this PR.

firewave avatar Oct 05 '22 20:10 firewave

You can also download the callgrind.<pid> from the workflow and view it locally with kcachegrind. The callgrind_annotate output to be a off on some functions.

On a side note. Being able to optimize this is good, but as seen in the callgrind output the problem are the non-matchcompiler Token::Match() calls.

firewave avatar Oct 07 '22 11:10 firewave

Right, I rebased and tested it on the latest master. With the constexpr-changes you suggested, there is a slight improvement, but not as much as I initially thought. So maybe we just close this?

rikardfalkeborn avatar Oct 07 '22 21:10 rikardfalkeborn

Yeah, the changes unfortunately don't seem to have that much effect. I will take another look in the next few days. Let's leave it open for now.

Still Clang profits from it and it would be great to get a sample to file a report upstream.

firewave avatar Oct 07 '22 21:10 firewave

Some std::string comparison related benchmarks: https://github.com/llvm/llvm-project/issues/58002#issuecomment-1396950031

I will also do some tests with the stuff we discussed in this PR.

firewave avatar Jan 19 '23 13:01 firewave

short string: Clang 15 - https://quick-bench.com/q/uMa0PbTdG7CZhp94pk2-aZ9VySE GCC 12.2 - https://quick-bench.com/q/HehK-2CaXR4-g3znvZlh21tzdH4

GCC seems to yield more consistent code compared to Clang.

longer string: Clang 15 - https://quick-bench.com/q/2jfuoRSk5wq5bQrvRaqC-gRZxKE GCC 12.2 - https://quick-bench.com/q/_3hA3P1HGN_7juNpdznXvn3iTps

constexpr hurts GCC but produces more consistent results with Clang.

firewave avatar Jan 19 '23 13:01 firewave

Turns out all of this is not necessary - see https://github.com/llvm/llvm-project/issues/58002#issuecomment-1396950031. Simply moving the string to compare with into a global static const std::string object is enough to produce even faster results than the MatchCompiler. Clang is slightly slower with single character strings but GCC is faster. So that optimization should be kept for now. We could also make it optional based on if the compiler has __clang__ defined though.

firewave avatar Jan 19 '23 20:01 firewave

I updated your changes to apply my findings and added some additional improvements. With those we no longer have need for matchcompiler.h which would be great. See https://github.com/firewave/cppcheck/tree/mc-opt with the changes applied in bulk.

Unfortunately callgrind gives Ir regressions on that.

GCC 12 1,910,631,872 - current 2,148,962,096 - static const std::string 2,057,846,913 - static const std::string + single-character optimization

But given these functions are called millions of times a change of just a few instructions will have a visible impact. But as more instructions does not mean more cycles the code might actually be faster. So we need actual cycle profiling for this.

firewave avatar Jan 20 '23 12:01 firewave