LLama3 8B is not supported
When I run:
RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct
I get:
RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.47s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. USER: hey ASSISTANT: Traceback (most recent call last): ... File "/home/liranringel/REST/rest/model/modeling_llama_kv.py", line 594, in forward key_states = past_key_value[0].cat(key_states, dim=2) File "/home/liranringel/REST/rest/model/kv_cache.py", line 66, in cat dst.copy_(tensor) RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1
Have you encountered the problem of segmentation fault (core dumped) when using Llama-3-8B and running python3 get_datastore_chat.py --model-path Meta-Llama-3-8B-Instruct?
When I run:
RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct
I get:
RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.47s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. USER: hey ASSISTANT: Traceback (most recent call last): ... File "/home/liranringel/REST/rest/model/modeling_llama_kv.py", line 594, in forward key_states = past_key_value[0].cat(key_states, dim=2) File "/home/liranringel/REST/rest/model/kv_cache.py", line 66, in cat dst.copy_(tensor) RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1
Hi, modeling_llama_kv.py is adapted from the older version of the Transformers library for Llama-2 and the changes are marked by [MODIFIED] (a few lines of code). For Llama3, you may adapt the modeling_llama.py from the latest Transformers library to ensure correct configs (e.g., group query attention).
Hi,
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1
As for this issue, it's caused by the group-query attention.
segmentation fault (core dumped) when using Llama-3-8B
As for this issue, it's caused by the large vocabulary size of Llama-3 which exceeds the range of u16.
They are all fixed in the llama3 branch thanks to Chinmaya Andukuri.
@zhenyuhe00 thanks!