ZeroYuJie
ZeroYuJie
Great job for this toolkit . I'm attempting to merge two models with differing `vocab_size`: `augmxnt/shisa-7b-v1` (base) and `teknium/OpenHermes-2.5-Mistral-7B`. The `augmxnt/shisa-7b-v1` model has an expanded `vocab_size`. However, after merging them...
I got error panic: `concurrent map writes` , BPE `TokenizeWithCache` func, Concurrent read and write operations on the map can lead to a panic. ``` func (b BPE) TokenizeWithCache(sequence string)...
I found an example regarding using Flask for API requests. I gave it a try, but when making concurrent requests, the generated responses from the inference appear as garbled text....
我在prompt里规定限制了语言,在使用https://github.com/dachengai/vllm 运行会出现输出不同语言的情况,在Transformers 中不会出现这种情况
To solve https://github.com/vllm-project/vllm/issues/8835, add sampler_priority to control the execution order of the samplers, and add repetition_penalty_range to control penalties sampler token range