Comparability of Benchmarks and Optimization
Following issue addresses a problem in benchmark comparability. It does not contain a final solution, but a few ideas and suggestions and also invites others to add more ideas on this topic.
Fortran has been unusually fast for a while, producing the Fibonacci benchmark in almost zero time. PR #401 shows some massive performance differences for gfortran for compile-time precomputation versus run-time computation.
I'm a little undecided. On the one hand, it speaks for this compiler, on the other hand, it just looks like a special rule for Fibonacci algorithm was triggered here.
This means that the benchmark has lost a lot of its meaning, since recursion is no longer considered at all. Ultimately, the visualization no longer shows how fast Fortran could be with regard to recursion, but simply how fast the program can output a precalculated value in this one special case.
It was repeatedly emphasized that doing the same amount of "work" is important.
Similarly, the Loops Benchmark does too much work and measures the content of the loops rather than the loops themselves. Nevertheless, eliminating the loops was not desired.
Of course, I also understand that it is difficult to set a limit on how much manual or automatic optimization can take place. But somehow it defeats the purpose of a benchmark if the work to be measured is no longer even done.
But let's say in this one example that we were not interested in recursion in general but specifically in Fibonacci. Now, with a little help, you could possibly achieve similar results in some languages for a known problem. The issue is, giving hints is currently not allowed anyway.
So there is a discrepancy depending on whether certain languages happen to be very suitable for a particular problem or whether they shift the focus from automation more towards manual control.
I think it's difficult when some languages can sometimes accidentally hit one of their special optimization rules, but others are artificially held back. One possibility would be to remove the optimizations for Fortran. This would better represent recursion itself, but would also be a somewhat unfair artificial downgrade.
Perhaps certain further optimizations could be allowed instead:
- Compiler Hints
- More freedom in problem solving itself
The former would better represent the potential capabilities of compilers and would not create an artificial disadvantage for languages that can work with them. The focus would be on what the compilers can achieve from a set of certain similar instructions.
The latter shifts the focus more to a clever problem solution itself. In the legacy benchmark for Levenshtein, for example, it would have been possible to reduce the number of comparisons required without changing the algorithm itself. Unfortunately, this means that the experience of an individual developer with regard to a particular problem would carry much more weight, but at the same time it may also allow for more creative adaptations for a particular language.
In future benchmarks, we should make sure that these are not everyday problems that have already been implemented many times in a very similar way and that could very likely trigger a special optimization rule.