Trying to setup Llama3-8B-1.58-100B-tokens with i2_s
I manually downloaded the model and set the model with the command "python setup_env.py -md .\models\Llama3-8B-1.58-100B-tokens -q i2_s" in Windows 11 OS. The result shows:
"ERROR:root:Error occurred while running command: Command '['./build/bin/Release/llama-quantize', '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf', '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-i2_s.gguf', 'I2_S', '1']' returned non-zero exit status 1., check details in logs\quantize_to_i2s.log"
After I check the log, it says: "main: build = 22 (bf11a49) main: built with Clang 18.1.8 for Win32 main: quantizing '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf' to '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-i2_s.gguf' as I2_S using 1 threads llama_model_quantize: failed to quantize: tensor 'output.weight' data is not within the file bounds, model is corrupted or incomplete main: failed to quantize model from '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf'"
Am I downloading the wrong version of Llama3-8B-1.58-100B-tokens, or are other operations I made wrong?
Same here:
Key issue appears to be:
GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed
--
Update: The model was not downloaded correctly, I re-ran "python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s" and all went well/
Model converting process takes too much memory. We recommend directly downloading the newly released official BitNet model, thanks.
Model converting process takes too much memory. We recommend directly downloading the newly released official BitNet model, thanks.
Hi @sd983527, How much memory is needed to convert Llama3-8B-1.58-100B-tokens? Do you plan to provide a converted version of this model? Thanks in advance