Short answers are generated. I need long ones.
Hello, I am asking for help. Short answers are generated. I need long ones. Let me tell you right away, I have flipped through the whole topic and tried different solutions. I am running it on windows 11.
I tried running it with the parameters \Alpakatestsborca\alpaca.cpp> .\Release\chat.exe -n 1024 -c 2048
But that didn't do anything.
I managed to fix the error on the limited request (long requests would crash) when I edited chat.cpp. It worked after I recompiled it. Maybe the situation here is similar, but I just don't know where to put them. Please help me.
cmake . cmake --build . --config Release w11 system
PS F:\Alpakatestsborca\alpaca.cpp> .\Release\chat.exe -n 1024 -c 2048 main: seed = 1679673643 llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: ggml ctx size = 6065.34 MB llama_model_load: memory_size = 2048.00 MB, n_mem = 65536 llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | main: interactive mode on. sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
== Running in chat mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMA.
- If you want to submit another line, end your input in ''.
help
Get extra RAM (at last 32GB) and use the 30B model. It provides quite long replies and has excellent reasoning skills.
Get extra RAM (at last 32GB) and use the 30B model. It provides quite long replies and has excellent reasoning skills.
Can you share what settings and prompts you're using? I'm using the 30B model and am also struggling to get longer responses. Every once in a while I'll get a long response, but most of the time they're just as short as the 7B model.
Get extra RAM (at last 32GB) and use the 30B model. It provides quite long replies and has excellent reasoning skills.
The 30B model looks like it disappeared from the readme. Am I missing something? How can I get it? Thank you