hx507 comments

Results 5 comments of


                                            hx507

Scale buf_size linearly with n_ctx

> Looking further, it also slowly creeps up as prompt being read(batch size = 4) To add another observation, the amount of memory increase per iteration seems to scale quadratically...

Python3 script instead of bash

> os.system(f"./quantize {os.path.join('models', sys.argv[1], i)} {os.path.join('models', sys.argv[1], i.replace('f16', 'q4_0'))} 2") Consider using something like `subprocess.call` to prevent security issues like command injections in filename.

Vision with llava-1.6-7B is unusable via CLI

Also seeing the same issue where llava from ollama performs significantly worse than other web hosted version. > I loaded lava 7b with version 0.1.32 and I get a good...

Vision with llava-1.6-7B is unusable via CLI

Interestingly, restarting ollama server makes the first image query work. For anything other than the first image query uploaded (even with a fresh client session), the model will just output...

Vision with llava-1.6-7B is unusable via CLI

Looking at the release note of 0.1.34 I think this is already addressed: > - Fixed issues with LLaVa models where they would respond incorrectly after the first request Seems...