研究社交
研究社交
Thanks! I am currently working on a new version of the inference runtime which gets rid of waiting time in some CPU-bounded cases, and multi-device inferencing is on its roadmap!
> 这个嘛 前端一个BUG 我今天给修一下。 Apple 的话应该是 Metal 后端,被 Web 前端过滤掉了。
v0.5.14 已经更新前端,请再试一次。
@cgisky1980
Actually the prefill speed would like to be maxed at about 256. Higher than that does not worth it. It is not a limit on the total token length. It...
I see. There is a limit in the backend for a single request which is 4k. I can remove it anyway.
The limit has been removed.
Thanks! There is a C ffi exists ([here](https://github.com/cryscan/web-rwkv-ffi)). It's not as flexible but is simple to use and extend. Feel free to extend it for your own usage, or reach...
Ah, I make it public now.
> The link https://github.com/cryscan/web-rwkv-ffi seems to be dead (404) Is that helpful to your application?