llama.cpp

llama.cpp copied to clipboard

Published 3 months ago •

Reame
Issues

llama : support RWKV v6 models

Open MollySophia opened this issue 1 year ago • 0 comments

This should fix #846.

Added:

ggml:

Added unary operation Exp
Added rwkv_wkv operation with CPU impl
Added rwkv_token_shift operation with CPU impl to handle multiple sequences in parallel(may not be necessary after #8526 is done)

llama.cpp:

rwkv_world tokenizer support (by @LaylBongers)
convert_hf_to_gguf.py support for converting RWKV v6 HF models
RWKV v6 graph building

TODO:

Do modifications after #8526 is ready accordingly
Add CUDA or Metal implementation for rwkv_wkv operation

[x] I have read the contributing guidelines
Self-reported review complexity:
- [x] Medium

Aug 11 '24 02:08 MollySophia