Benjamin Worpitz
Benjamin Worpitz
I accidentally had `RC_PARAMS` with a specific `reproduce` in my environment that were related to a specific test executable. While extending expectations in a different test executable I was surprised...
We should investigate into adding a coroutine based backend. It should be similar to the fiber based backend (cooperative multitasking within a block, possibly mutlithreaded block execution). Coroutines are supported...
Currently `AtomicStdLibLock` has a static mutex hash table which allows it to synchronize between all grids executed within a process. However, this is not documented and does not conform to...
This is the follow up to an offline discussion. It is not yet a real issue. Alpaka enforces that kernel arguments are either taken by `value` or by `const &`....
By executing the fibers randomly we prevent memory prefetching. Iterating X 1st, Y 2nd, Z 3rd (native C memory order) we would assist the prefetcher by using the expected default...
Enhance the fibers implementation by parallelizing the execution of the blocks.
There should not be direct access to memory buffers. This always implies knowledge about the memory layout (row or col) which is not necessarily correct on the underlying accelerator.
This allows some methods that have const memory to prevent double buffering on the host, a method `copyIfDifferentMem` should be implemented.
Is it allowed for the host code to be multithreaded itself? Restricting it is neither useful nor realistically enforceable. Maybe one thread calculates something using a CUDA device while the...
cudaMalloc -> cudaMallocManaged cudaMemcpy -> cudaMemPrefetchAsync cudaMemAdvice for pinning? The alloc function currently takes a device. If this is DevGpuCuda, the Cuda versions are called. We would need something like...