Ryan

.

Results 3 comments of


                                            Ryan

Generation using GPU offloading is much slower than without

Same here, much much slower without gpu offloading in my case its close to 80ish ms per token, but with off loading its 700ish ms per token...

Generation using GPU offloading is much slower than without

And it takes more time to load the model too.

Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-2: VK_ERROR_OUT_OF_DEVICE_MEMORY Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.

i am on windows