llama.cpp
llama.cpp copied to clipboard
llama : support RWKV v6 models
This should fix #846.
Added:
ggml:
- Added unary operation
Exp - Added
rwkv_wkvoperation with CPU impl - Added
rwkv_token_shiftoperation with CPU impl to handle multiple sequences in parallel(may not be necessary after #8526 is done)
llama.cpp:
-
rwkv_worldtokenizer support (by @LaylBongers) -
convert_hf_to_gguf.pysupport for converting RWKV v6 HF models - RWKV v6 graph building
TODO:
- Do modifications after #8526 is ready accordingly
- Add CUDA or Metal implementation for
rwkv_wkvoperation
- [x] I have read the contributing guidelines
- Self-reported review complexity:
- [x] Medium