runtime
runtime copied to clipboard
Add dynamic shared memory allocation
This adds a parameter to allocate a given amount of dynamic shared memory upon kernel launch. Wrapper functions that just pass 0 are provided for backwards compatibility with existing code. Currently implemented for CUDA only, other platforms will error.
corresponding thorin changes: https://github.com/AnyDSL/thorin/pull/144