May not even be a Transformers issue.. WizardLM-Uncensored-Falcon-40

Open linuxmagic-mp opened this issue 2 years ago • 1 comments

Just could use some feedback on debugging with ctransformers, have a strange case where things are generally working, but occasionally I don't get output... using /models/WizardLM-Uncensored-Falcon-40b/ggml-model-falcon-40b-wizardlm-qt_k5.bin (GGML)

tokens = llm.tokenize('I want to give you a female name.  What is your favourite female names, give me your top five.  And a preference on what you preferred to be called.')

for token in llm.generate(tokens):
    print(llm.detokenize(token))

works always..

print(llm('I want to give you a female name.  What is your favourite female names, give me your top five.  And a preference on what you preferred to be called.'))

Sometimes there is NO output.

Scratching my head on how to debug this?

Aug 12 '23 17:08 linuxmagic-mp

llm(...) doesn't return until the entire text is generated whereas llm.generate(...) sends tokens one-by-one as they get generated. Is it exiting without error and without printing anything? Try using stream=True:

for text in llm(prompt, stream=True):
    print(text)

Aug 15 '23 12:08 marella