Dillon Sharlet
Dillon Sharlet
I found that this target flag speeds up many of the correctness tests substantially (e.g. correctness_vector_reductions goes from 42s to 27s). This change might dramatically speed up buildbot testing if...
Here's a simpler version of that reproducer: ``` { count = 0; Func f1, f2, g1, g2, h; f1(y) = call_counter(y, 0); f2(y) = f1(y); g1(y) = f2(y - 1)...
Certainly not intentionally... I think it's more likely that some of your changes are fixing it. Something to check is that it may be failing to use the new strategy,...
I just checked, camera_pipe is still using the new strategy at least. And there are some tests that should fail if it regressed to the old strategy in some cases...
We now have another stack usage regression causing crashes. Is this PR something that is ready to merge early this week?
I think this is a good change, but I think maybe we should go ahead and do it for all the apps? The reason I think it is good is...
A bit of data here: #5548 reduces the number of non-LLVM symbols in libHalide.so by 25-50%, to around 3,000 symbols. However... there are ~30,000 symbols coming from LLVM. It seems...
I guess if anything in Halide.h uses LLVM types, that might not work well.
Maybe the better fix is to just use an actual mutex?
I think the main reason we've avoided a mutex here is not performance but messiness in initialization. Can we zero initialize a global qurt_mutex_t?