基本支持RWKV6
基本支持RWKV6
转换,加载,lora好了,但是计算图有问题,rwkv_graph.inc:348行附近怎么都改不好,我打了感叹号标记
Hi!
Although I've stepped down as the maintainer of rwkv.cpp, I think my input will still be valuable:
- ~~
rwkv_att_v6looks like a naive (as in "creates and operates on full-blown matrices") implementation of the attention. If it's true, then v6 models will be too slow and probably unusable on large context lengths. See comparisons for v5. I recommend pulling an optimized implementation from somewhere and adapting it intorwkv.cpp, as was done with v5.~~ - there are no tests for v6 models, no Tiny RWKV trained, etc. I would not merge v6 support without proper tests added, because it is asking for something to be unknowingly broken.
Edit: I may be mistaken about performance issues, because rwkv_att_v6 actually calls rwkv_wkv_v5. In any case, I recommend doing latency/memory usage measurements and comparing them with v5 models.
The second point still stands -- there must be quality assurance.
Hey there! Thanks for the pull request for the long-awaited RWKV V6 support. I'll soon get to reviewing the code, it's on my schedule! Unfortunately I cannot read Chinese, so I've had to get some help translating. As already mentioned, to make sure support is (and continues to be) functional, we need to include some tests.
Sorry, it's not convenient for me to reply at school. This pr only supports the conversion and loading of RWKV6. The file "rwkv_graph.inc" has an issue near lines 342-347 that I cannot solve. No matter how I modify it, I may encounter errors such as "GGML_ASSERT: /media/yuchuxi/YuZi/Project/Mozi/rwkv.cpp/ggml/src/ggml.c:6499: ggml_is_contiguous(a) "
I'll go find a small model to test
RWKV v6 support was merged in other PR.