Geonmin Kim
Geonmin Kim
During convert [llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) & [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-v1.3/tree/main), I found that [convert script](https://github.com/ggerganov/llama.cpp) skip `rope_freqs`, `attn_rot_embd` modules in each layer. These are defined in [MODEL_TENSOR_SKIP](https://github.com/ggerganov/llama.cpp/blob/6560bed3f066c876682464762cad90f1e28e3f1b/gguf-py/gguf/constants.py#L518-L522) as well. What are they and why can...
Thanks for sharing interesting paper and code. As far as I understand, all the loss function used in the paper is implemented in [encoder.py](https://github.com/princeton-nlp/DensePhrases/blob/main/densephrases/encoder.py). However, it is unclear for me...
I can see that optimum-quanto provides several external (weight-only) quantization algorithm such as smoothquant and awq in [here](https://github.com/huggingface/optimum-quanto/tree/main/external). It looks like smoothquant only supports OPT models, and awq is still...
### Name and Version $./llama-cli --version load_backend: loaded CPU backend from /app/libggml-cpu-icelake.so version: 5280 (27aa2595) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu ### Operating systems Linux ### GGML backends...
### System Info - CPU: x86_64 - GPU name: NVIDIA H100 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own...
Torchtune provides several config for fine-tuning LLM (such as [llama3_2/3B_full.yaml](https://github.com/pytorch/torchtune/blob/main/recipes/configs/llama3_2/3B_full.yaml)), and they often used alpaca dataset. Could you suggest some evaluation benchmark whether LLM is trained properly with alpaca (or...
I found that Qwen-2 pruning example is recently added in torch-pruning : https://github.com/VainF/Torch-Pruning/tree/master/examples/LLMs#rocket-qwenqwen2-7b Thanks for updating! I try this script to block-pruned (4 blocks) qwen-2 architecture. * Original architecture ```...