Unexpected output
EDIT: I'm running this on an M1 Macbook. Using the model directly works as expected, but running it through Python gives me this output. The .dylib binary is built from source too.
Do you know what could be giving me this output? Using the model without the bindings works as expected...
"id": "cmpl-f49883d5-e368-4fa0-a4fa-bf758daa1831",
"object": "text_completion",
"created": 1680203705,
"model": "ggml-model-q4_0-new.bin",
"choices": [
{
"text": "Question: What are the names of the planets in the solar system? Answer: \u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 48,
"total_tokens": 67
}
}
What are the model / eval parameters you're using? And can you confirm that llama.cpp works as expected with the same params.
I've run across something similar before when evaluating the model with temperature=0 but I didn't dig too deep into it at the time.
What are the model / eval parameters you're using?
All defaults. I'm running the 7B 4-bit quantized llama model. I also have a 7B 4-bit quantized alpaca model, both converted to the new format. Both models work as expected. This is the example I'm running:
import json
import argparse
from llama_cpp import Llama
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="ggml-model-q4_0-new.bin")
args = parser.parse_args()
llm = Llama(model_path=args.model)
output = llm(
"Question: What are the names of the planets in the solar system? Answer: ",
max_tokens=48,
stop=["Q:", "\n"],
echo=True,
)
print(json.dumps(output, indent=2))
I tried out the embeddings example and that seemed to work fine.
Do you get more meaningful output if you set the seed to random?
llm = Llama(model_path="ggml-model-q4_0-new.bin", seed=-1)
@rjadr Nope, that didn't seem to affect it.
Sorry @jessejohnson I haven't been able to reproduce this (tried on an M1 Macbook Air with the alpaca 7b weights). This might be a longshot but which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?
which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?
Sorry for the late reply, mate. I use Conda. And yes, both weights give me the same output. I haven't made time to investigate further.
@jessejohnson are you still seeing the problem with the latest package and latest quantized weights?
Hey @gjmulder, I haven't had the chance to play with this for a while, but I'll get back to it when I have the time.
Please reopen when you have time.
@gjmulder Just returned to this, using the latest llama.cpp/llama-cpp-python. I get the expected output.