Omar Costilla Reyes
Omar Costilla Reyes
I think @jnothman reference is the best that we currently have. Does anyone know if this will be ported to Eli? thanks,
I have the same problem: ``` Process SpawnPoolWorker-9: Traceback (most recent call last): File "C:\Users\Omar\AppData\Local\conda\conda\envs\tensor19\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\Users\Omar\AppData\Local\conda\conda\envs\tensor19\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\Users\Omar\AppData\Local\conda\conda\envs\tensor19\lib\multiprocessing\pool.py",...
code only works with 1 worker :(
``` sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 The history of humanity starts with the bing bang, then ête estudios...
same happens with 30B model: `ubuntu@ip-x:~/llama.cpp$ ./main -m ./models/30B/ggml-model-q4_0.bin \ > -t 16 \ > -n 1000000 \ > -p 'The history of humanity starts with the bing bang, then...
I can confirm that 7B and 13B work for me. 30B and 65B are the ones not giving correct output.
can you share the size of the files as well? and also a successfully executed example with both models? Thanks!
I can confirm that the lastest branch (March 15 2023) works for all models. You will have to redo the quantization to make it work if you had problems.
then 65B model uses 31% of 128 GB RAM when performing inference
example output: ``` ubuntu@ip-x~/llama.cpp$ ./main -m ./models/65B/ggml-model-q4_0.bin \ > -t 32 \ > -n 100000 \ > -p 'The history of humanity from bing bang to today can be devided...