BitNet Trying to setup Llama3-8B-1.58-100B-tokens with i2

I manually downloaded the model and set the model with the command "python setup_env.py -md .\models\Llama3-8B-1.58-100B-tokens -q i2_s" in Windows 11 OS. The result shows:

"ERROR:root:Error occurred while running command: Command '['./build/bin/Release/llama-quantize', '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf', '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-i2_s.gguf', 'I2_S', '1']' returned non-zero exit status 1., check details in logs\quantize_to_i2s.log"

After I check the log, it says: "main: build = 22 (bf11a49) main: built with Clang 18.1.8 for Win32 main: quantizing '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf' to '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-i2_s.gguf' as I2_S using 1 threads llama_model_quantize: failed to quantize: tensor 'output.weight' data is not within the file bounds, model is corrupted or incomplete main: failed to quantize model from '.\models\Llama3-8B-1.58-100B-tokens\ggml-model-f32.gguf'"

Am I downloading the wrong version of Llama3-8B-1.58-100B-tokens, or are other operations I made wrong?

Nov 22 '24 16:11 yujiapingff

Same here:

Key issue appears to be:

GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed

--

Update: The model was not downloaded correctly, I re-ran "python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s" and all went well/

Nov 23 '24 18:11 chintan-ushur

Model converting process takes too much memory. We recommend directly downloading the newly released official BitNet model, thanks.

Apr 16 '25 05:04 sd983527

Model converting process takes too much memory. We recommend directly downloading the newly released official BitNet model, thanks.

Hi @sd983527, How much memory is needed to convert Llama3-8B-1.58-100B-tokens? Do you plan to provide a converted version of this model? Thanks in advance

May 30 '25 13:05 GuillaumeRattin

Trying to setup Llama3-8B-1.58-100B-tokens with i2_s