llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Can't find model in directory when merging model

Open jared-taylor opened this issue 1 year ago • 7 comments

I'm trying to merge fine tuned safetensors into a model so I can save it as a gguf format. The directory contains adapter_config.json and adapter_model.safetensors, but the error message indicates that no model is found in that directory. Here is the code I'm using to do the merge and below is the exact error message:

!python llama.cpp/convert.py Falcon_merged
--outfile falcon-merged.gguf
--outtype q8_0

Traceback (most recent call last): File "/content/llama.cpp/convert.py", line 1486, in main() File "/content/llama.cpp/convert.py", line 1422, in main model_plus = load_some_model(args.model) File "/content/llama.cpp/convert.py", line 1280, in load_some_model raise Exception(f"Can't find model in directory {path}") Exception: Can't find model in directory Falcon_merged

jared-taylor avatar Mar 14 '24 17:03 jared-taylor

I'm trying to merge fine tuned safetensors into a model so I can save it as a gguf format. The directory contains adapter_config.json and adapter_model.safetensors, but the error message indicates that no model is found in that directory. Here is the code I'm using to do the merge and below is the exact error message:

!python llama.cpp/convert.py Falcon_merged --outfile falcon-merged.gguf --outtype q8_0

Traceback (most recent call last): File "/content/llama.cpp/convert.py", line 1486, in main() File "/content/llama.cpp/convert.py", line 1422, in main model_plus = load_some_model(args.model) File "/content/llama.cpp/convert.py", line 1280, in load_some_model raise Exception(f"Can't find model in directory {path}") Exception: Can't find model in directory Falcon_merged

I think you should use convert-hf-to-gguf.py script, make sure the safetensors model(or pytorch) is in that folder, adapter_model.safetensors is not a model, it's a lora adapter that you need to merge with its base model.

Ar57m avatar Mar 15 '24 00:03 Ar57m

Thanks, all this time I assumed the issue was with the gguf conversion but didn't realize the real issue is with the merging. The files I mentioned above are all that is created when I run the following code, which I assume is not how it is supposed to work:

from transformers import AutoModelForCausalLM from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("ybelkada/falcon-7b-sharded-bf16", low_cpu_mem_usage=True) peft_model_id = "fishbeef/autotrain_test_falcon" model = PeftModel.from_pretrained(base_model, peft_model_id) model.merge_and_unload() model.save_pretrained("Falcon_merged")

What files SHOULD be getting created from that process? In all the many guides I've looked at, it is never mentioned what the output should be from the save_pretrained() function

jared-taylor avatar Mar 15 '24 01:03 jared-taylor

What files SHOULD be getting created from that process? In all the many guides I've looked at, it is never mentioned what the output should be from the save_pretrained() function

I think it should appears something similar to this "model-00001-of-00001.safetensors" or pytorch_model.bin(I can't remember) inside the folder Falcon_merged.

I use this very similar script, but I haven't tested on Falcon:

import torch
from peft import PeftModel
import transformers
import os, time
import tempfile
from transformers import AutoModelForCausalLM, AutoTokenizer


BASE_MODEL = "base/model"
LORA_WEIGHTS = "lora/lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
    
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    load_in_8bit=False,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    offload_folder="offload", 
)
    
model = PeftModel.from_pretrained(
    model, 
    LORA_WEIGHTS, 
    torch_dtype=torch.bfloat16,
    device_map="auto",
    offload_folder="offload", 

)

model = model.merge_and_unload()
model.save_pretrained("merged_model" )

Ar57m avatar Mar 15 '24 02:03 Ar57m

Thanks a ton, that seems to have worked as I now have the "model-00001-of-00001.safetensors" files you mentioned, which I recall seeing on other models as well. Now I'm back to being stuck on the merge but for other reasons. I switched to the other script because I realized I should not be using convert.py, but rather convert-hf-to-gguf.py:

!python llama.cpp/convert-hf-to-gguf.py /content/Falcon_merged
--outfile falcon-merged.gguf
--outtype f16

The problem now is that it can't load tokenizer. There is no tokenizer file in that directory, should that be getting created with the steps you provided, or should I be getting it from either the base model or my LORA weights? Both have a tokenizer on the hf model hub, but I'm not sure what to use.

jared-taylor avatar Mar 20 '24 17:03 jared-taylor

The problem now is that it can't load tokenizer. There is no tokenizer file in that directory, should that be getting created with the steps you provided, or should I be getting it from either the base model or my LORA weights? Both have a tokenizer on the hf model hub, but I'm not sure what to use.

I think you should use the base tokenizer

Ar57m avatar Mar 20 '24 17:03 Ar57m

I tried using the base tokenizer, and I also added "tokenizer.save_pretrained("Falcon_merged")" to my code. Both returned the same error when using convert-hf-to-gguf.py: RuntimeError: shape '[71, 3, 64, 4544]' is invalid for input of size 21229568

I haven't been able to find any info about what is causing the error, have you seen this before or might know why this is happening?

jared-taylor avatar Mar 21 '24 12:03 jared-taylor

I tried using the base tokenizer, and I also added "tokenizer.save_pretrained("Falcon_merged")" to my code. Both returned the same error when using convert-hf-to-gguf.py: RuntimeError: shape '[71, 3, 64, 4544]' is invalid for input of size 21229568

I haven't been able to find any info about what is causing the error, have you seen this before or might know why this is happening?

I never seen that unfortunately, idk how to help with that

Ar57m avatar Mar 21 '24 13:03 Ar57m

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar May 05 '24 01:05 github-actions[bot]