Penut Chen comments

Results 17 comments of


                                            Penut Chen

Training new Vicuna based on fully open-source OpenLLaMA

Is there any plan since [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b) and [OpenLLaMA-13](https://huggingface.co/openlm-research/open_llama_13b) have been fully trained and released?

[Roadmap] vLLM Roadmap Q2 2024

Will larger vocabulary size for multi-lora be supported in Q2 2024? Related: https://github.com/vllm-project/vllm/issues/3000

How can I use the Lora Adapter for a model with Vocab size 40960?

+1. Need support for a larger vocabulary size. In addition to vocabulary size, there is also a use case like LLaMA Pro block expansion fine-tuning with LoRA, which may not...

`MistralAttention`: where is the sliding window

I think the sliding window trick is based on masking? https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L998-L1018

`MistralAttention`: where is the sliding window

I have the same question. I think the sliding window has two aspects: 1. From the perspective of the attention mask, it essentially acts as a token-level sliding window that...

add open-llama model with ckpt

@s-JoL I noticed that the links pertaining to Open-LLaMA are currently leading to 404 errors. Could you please provide some information on what might have happened?

add open-llama model with ckpt

@heya5 Possibly due to some controversies surrounding this project, the original author has closed the original project. https://github.com/chenfeng357/open-Chinese-ChatLLaMA/issues/1

[Feature] Supports llama.cpp

There might be more information in [ggml](https://github.com/ggerganov/ggml), since they have conversion scripts that can convert model weights from HuggingFace format to ggml format, located in [examples/gpt-j](https://github.com/ggerganov/ggml/blob/master/examples/gpt-j/convert-h5-to-ggml.py) and [examples/gpt-2](https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/convert-h5-to-ggml.py).

`8-bit quantization` support

This much-needed feature will enable a 13b model to fit into a single 24GB VRAM GPU.

`8-bit quantization` support

> Are you wanting `load_in_8bit` from HF or would you consider the AWQ GPTQ support sufficient? @hmellor Since AWQ is becoming more popular and GPTQ is supported in vLLM, I...