pace
pace copied to clipboard
`gt:gpu` slowdown on `gtpy` v1 due to change in allocation
gt4py v1 removes the Storage class and allow any __array_interface__ describing object to be bound. Unfortunately, the default cupy allocation used in our model has a bad stride (should have unit stride) leading to performance decrease in the backend.
Potential solution:
- use gt4py provided allocator (and optimized for the backend)
- make sure striding in our GPU allocation is unit