Using externally provided allocator for local thread cause heap corruption when unloading dxcompiler.dll
There is a hashToIdxMap inside DxilShaderModel.cpp that is a static local variable to ShaderModel::Get. This will be initialized on first call and since operator new/delete is overridden it will use whatever is in there. If external call site provides its own allocator this "operator new" call will route to the external allocator and the static std::unordered_map hashToIdxMap will have an allocation on that allocator.
When dxcompiler.dll unloads static deinitialization happens and hashToIdxMap will be freed. No local thread allocator is available and it will fallback to default Com allocator which is not owner of that allocation and a heap corruption happens.
My suggestion is to explicitly initialize that hashToIdxMap after the default Com allocator has been initialized. Or add a custom std::allocator for that unordered_map that don't go to default operator new/delete.
Is there any progress on this one? It's preventing us from using a thread-local allocator, instead having to fall back to a global one, with all the thread contention that implies.
Could the unordered_map maybe just be converted to a switch statement? Probably slower, but maybe not critically?
Hello again. I've been investigating further crashes during shutdown of the compiler dlls (on console this time), and found several more places where lazy-initialised global variables are allocated on the user-provided IMalloc, but then freed at shutdown (in llvm::llvm_shutdown) from the default COM malloc, causing heap corruption and crashes. Basically anything using ManagedStatic seems liable to cause this problem.
The unfortunate conclusion I have reached is that there is no safe way to provide a custom IMalloc. Is this a correct statement? Can anyone suggest a workaround? Using a thread-local allocator is a fairly significant performance boost, so it's a shame to lose this feature, but unfortunately we have to favour not-crashing over performance :)
Realistically we're not going to be able to find time to investigate this further. We know that memory allocation performance / customization is something we'll need to look at in the HLSL/Clang project, and this is where an issue like this would be addressed.