Penut Chen
Penut Chen
Is there any plan since [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b) and [OpenLLaMA-13](https://huggingface.co/openlm-research/open_llama_13b) have been fully trained and released?
Will larger vocabulary size for multi-lora be supported in Q2 2024? Related: https://github.com/vllm-project/vllm/issues/3000
+1. Need support for a larger vocabulary size. In addition to vocabulary size, there is also a use case like LLaMA Pro block expansion fine-tuning with LoRA, which may not...
I think the sliding window trick is based on masking? https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L998-L1018
I have the same question. I think the sliding window has two aspects: 1. From the perspective of the attention mask, it essentially acts as a token-level sliding window that...
@s-JoL I noticed that the links pertaining to Open-LLaMA are currently leading to 404 errors. Could you please provide some information on what might have happened?
@heya5 Possibly due to some controversies surrounding this project, the original author has closed the original project. https://github.com/chenfeng357/open-Chinese-ChatLLaMA/issues/1
There might be more information in [ggml](https://github.com/ggerganov/ggml), since they have conversion scripts that can convert model weights from HuggingFace format to ggml format, located in [examples/gpt-j](https://github.com/ggerganov/ggml/blob/master/examples/gpt-j/convert-h5-to-ggml.py) and [examples/gpt-2](https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/convert-h5-to-ggml.py).
This much-needed feature will enable a 13b model to fit into a single 24GB VRAM GPU.
> Are you wanting `load_in_8bit` from HF or would you consider the AWQ GPTQ support sufficient? @hmellor Since AWQ is becoming more popular and GPTQ is supported in vLLM, I...