llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Unexpected output

Open jessejohnson opened this issue 2 years ago • 6 comments

EDIT: I'm running this on an M1 Macbook. Using the model directly works as expected, but running it through Python gives me this output. The .dylib binary is built from source too.

Do you know what could be giving me this output? Using the model without the bindings works as expected...

  "id": "cmpl-f49883d5-e368-4fa0-a4fa-bf758daa1831",
  "object": "text_completion",
  "created": 1680203705,
  "model": "ggml-model-q4_0-new.bin",
  "choices": [
    {
      "text": "Question: What are the names of the planets in the solar system? Answer: \u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 48,
    "total_tokens": 67
  }
}

jessejohnson avatar Mar 30 '23 19:03 jessejohnson

What are the model / eval parameters you're using? And can you confirm that llama.cpp works as expected with the same params.

I've run across something similar before when evaluating the model with temperature=0 but I didn't dig too deep into it at the time.

abetlen avatar Mar 31 '23 01:03 abetlen

What are the model / eval parameters you're using?

All defaults. I'm running the 7B 4-bit quantized llama model. I also have a 7B 4-bit quantized alpaca model, both converted to the new format. Both models work as expected. This is the example I'm running:

import json
import argparse

from llama_cpp import Llama

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="ggml-model-q4_0-new.bin")
args = parser.parse_args()

llm = Llama(model_path=args.model)

output = llm(
    "Question: What are the names of the planets in the solar system? Answer: ",
    max_tokens=48,
    stop=["Q:", "\n"],
    echo=True,
)

print(json.dumps(output, indent=2))

I tried out the embeddings example and that seemed to work fine.

jessejohnson avatar Mar 31 '23 02:03 jessejohnson

Do you get more meaningful output if you set the seed to random?

llm = Llama(model_path="ggml-model-q4_0-new.bin", seed=-1)

rjadr avatar Mar 31 '23 19:03 rjadr

@rjadr Nope, that didn't seem to affect it.

jessejohnson avatar Mar 31 '23 23:03 jessejohnson

Sorry @jessejohnson I haven't been able to reproduce this (tried on an M1 Macbook Air with the alpaca 7b weights). This might be a longshot but which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?

abetlen avatar Apr 01 '23 18:04 abetlen

which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?

Sorry for the late reply, mate. I use Conda. And yes, both weights give me the same output. I haven't made time to investigate further.

jessejohnson avatar Apr 13 '23 10:04 jessejohnson

@jessejohnson are you still seeing the problem with the latest package and latest quantized weights?

gjmulder avatar May 15 '23 11:05 gjmulder

Hey @gjmulder, I haven't had the chance to play with this for a while, but I'll get back to it when I have the time.

jessejohnson avatar May 17 '23 15:05 jessejohnson

Please reopen when you have time.

gjmulder avatar May 17 '23 16:05 gjmulder

@gjmulder Just returned to this, using the latest llama.cpp/llama-cpp-python. I get the expected output.

jessejohnson avatar May 31 '23 16:05 jessejohnson