not all tracing times end up in the cache files
These lines in cost_func:
https://github.com/benvanwerkhoven/kernel_tuner/blob/master/kernel_tuner/strategies/minimize.py#L121-L130
Come after the config has been stored in the cache, which means the times computed here do not end up in the cachefile. Also, brute_force is not using cost_func so it can't be used this way to produce these trace timings. Both issues can probably be solved by moving these lines into the runner.
We also need the simulation runner to somehow simulate the same compile_time, framework_time, and so on when tuning with a search strategy under a time limit. We can probably simulate this by moving the clock used to check the time limit. The easiest way to support this is probably to accumulate simulated times in the tuning_options and check inside the 'check_stop_criterion' method if we are running in a simulation or not.
Need to fix this as well: https://github.com/benvanwerkhoven/kernel_tuner/blob/master/kernel_tuner/strategies/minimize.py#L122 perf_counter is already in seconds
Currently rewriting _cost_func, the runners, and core (mostly compile_and_benchmark) to simplify things, make it clearer which function is responsible for what, and ensure all timings end up where they need to and time the right things