snmalloc How to use the new hardening features

I am not seeing how to use the new hardening features when including snmalloc as a header-only library on Windows. There doesn't seem to be any #define or similar mentioned in the documentation? Is it just enabled by default?

May 10 '22 14:05 Zeblote

It is enabled with

SNMALLOC_CHECK_CLIENT

I will update the documentation. That does not enable the memcpy protection.

On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.

May 10 '22 14:05 mjp41

Thanks! I made a quick test comparing this new version against FMallocBinned2, though whether this can be considered a proper benchmark is questionable. Same integration code as my previous issue https://github.com/microsoft/snmalloc/issues/478.

FMallocBinned2, memory usage: Start game perfmon_2022-05-10_17-02-33

Open map perfmon_2022-05-10_17-02-51

Load large save perfmon_2022-05-10_17-03-42

Clear large save perfmon_2022-05-10_17-04-27

Load large save again perfmon_2022-05-10_17-05-24

snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, memory usage: Start game perfmon_2022-05-10_17-15-10

Open map perfmon_2022-05-10_17-15-24

Load large save perfmon_2022-05-10_17-16-08

Clear large save perfmon_2022-05-10_17-17-22

Load large save again perfmon_2022-05-10_17-18-11

It seems like it doesn't free unused memory nearly as much as the default unreal allocator. Overall usage is also a bit higher. This seems to be a clear win for FMallocBinned2.

FMallocBinned2, performance: Brickadia-Win64-Shipping_2022-05-10_17-05-45

snmalloc 0.6 with SNMALLOC_CHECK_CLIENT, performance: Brickadia-Win64-Shipping_2022-05-10_17-18-24

Only slightly slower on allocations, but much faster on deallocations. This seems to be a win for snmalloc overall.

The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.

May 10 '22 15:05 Zeblote

Thanks for running this benchmark. The CHECK_CLIENT version will use more memory. It needs to keep enough spare space for randomisation to have enough entropy. There are a few cases where it decides to allocate more memory just to increase randomness in allocation patterns.

The game also seems to crash on exit when snmalloc is used but I haven't yet investigated why.

So the CHECK_CLIENT does a bunch of checking very lazily. We check consistency at the end to force any lazy checks that have not occurred. My guess is that it has found corruption during teardown. Would be super interested if that is the case.

May 10 '22 16:05 mjp41

On Windows the performance of the memcpy protection is not great due to the way virtual memory works. We can't map a large accessible zero range without actually committing the memory. We could add a Vectored Exception handler, but then it is going to harm the debugging experience.

Can you explain why this is required? Is it to support allocations from outside snmalloc?

For example, in Unreal Engine, there are over 1000 calls to FMemory::Memcpy in runtime modules, the absolute vast majority only ever operating on memory obtained from FMemory. It would probably be a good start to guard all of those.

Debugging experience is not really important for distribution builds. Though if it interfered with Unreal's crash reporting, that would be a problem.

May 15 '22 14:05 Zeblote

Yeah, it is to support allocations from outside snmalloc. The key issue is "absolute vast majority", if one isn't and we crash because of it, then customers will get very annoyed. For instance,

   char buffer[256];
   memcpy(src, buffer, std::min(256, src_size));

This needs the stack allocation of buffer to not create an access violation when we look up the meta-data. On Linux for instance, we mmap a 256GiB region of memory, and then any access is allowed. On Windows, we MEM_RESERVE, a 256GiB region, and then MEM_COMMIT the bits that correspond to snmalloc allocations. This means that accessing the meta-data on Windows for anything that doesn't correspond to an allocation will segfault. There are three approaches we could take

Any attempt to access a meta-data that is not guaranteed to be snmalloc controlled causes a call to MEM_COMMIT. This is the current implementation and is very slow.
We could install a VectoredExceptionHandler to detect accessing the meta-data when it is not committed, and commit it. This is much cheaper, but leads to a poor debugging experience. The debugger will probably trap the access violation before the handler, and lead users to believe that something is wrong when it isn't.
Install shared 2MiB zero pages for all the range outside that currently used by snmalloc. This would incur a few MiB of commit charge for the associated page tables, and the shared large page. It would not involve any exceptions, so debugging would be fine. It is however a pretty complex change.

I prototyped the third option a couple of years ago, and I think given the new refactoring in 0.6.0 it would fit quite nicely. I don't currently have time to implement it though.

May 16 '22 08:05 mjp41

Oh, you're right. I completely forgot about stack allocations. Those are very common, so it would be unusable.

Hope you'll find time to implement a good performing solution at some point!

May 16 '22 11:05 Zeblote