llama-cpp-python Unexpected output

EDIT: I'm running this on an M1 Macbook. Using the model directly works as expected, but running it through Python gives me this output. The .dylib binary is built from source too.

Do you know what could be giving me this output? Using the model without the bindings works as expected...

  "id": "cmpl-f49883d5-e368-4fa0-a4fa-bf758daa1831",
  "object": "text_completion",
  "created": 1680203705,
  "model": "ggml-model-q4_0-new.bin",
  "choices": [
    {
      "text": "Question: What are the names of the planets in the solar system? Answer: \u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 48,
    "total_tokens": 67
  }
}

Mar 30 '23 19:03 jessejohnson

What are the model / eval parameters you're using? And can you confirm that llama.cpp works as expected with the same params.

I've run across something similar before when evaluating the model with temperature=0 but I didn't dig too deep into it at the time.

Mar 31 '23 01:03 abetlen

What are the model / eval parameters you're using?

All defaults. I'm running the 7B 4-bit quantized llama model. I also have a 7B 4-bit quantized alpaca model, both converted to the new format. Both models work as expected. This is the example I'm running:

import json
import argparse

from llama_cpp import Llama

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="ggml-model-q4_0-new.bin")
args = parser.parse_args()

llm = Llama(model_path=args.model)

output = llm(
    "Question: What are the names of the planets in the solar system? Answer: ",
    max_tokens=48,
    stop=["Q:", "\n"],
    echo=True,
)

print(json.dumps(output, indent=2))

I tried out the embeddings example and that seemed to work fine.

Mar 31 '23 02:03 jessejohnson

Do you get more meaningful output if you set the seed to random?

llm = Llama(model_path="ggml-model-q4_0-new.bin", seed=-1)

Mar 31 '23 19:03 rjadr

@rjadr Nope, that didn't seem to affect it.

Mar 31 '23 23:03 jessejohnson

Sorry @jessejohnson I haven't been able to reproduce this (tried on an M1 Macbook Air with the alpaca 7b weights). This might be a longshot but which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?

Apr 01 '23 18:04 abetlen

which package manager did you use to install this, also did you get the same weird outputs with the alpaca weights as the llama weights?

Sorry for the late reply, mate. I use Conda. And yes, both weights give me the same output. I haven't made time to investigate further.

Apr 13 '23 10:04 jessejohnson

@jessejohnson are you still seeing the problem with the latest package and latest quantized weights?

May 15 '23 11:05 gjmulder

Hey @gjmulder, I haven't had the chance to play with this for a while, but I'll get back to it when I have the time.

May 17 '23 15:05 jessejohnson

Please reopen when you have time.

May 17 '23 16:05 gjmulder

@gjmulder Just returned to this, using the latest llama.cpp/llama-cpp-python. I get the expected output.

May 31 '23 16:05 jessejohnson