Ryan

Results 3 comments of Ryan

Same here, much much slower without gpu offloading in my case its close to 80ish ms per token, but with off loading its 700ish ms per token...

And it takes more time to load the model too.