Results 6 issues of Aaron Miller

Colors are currently inconsistent across different machines (or OS configuration?) ideally put these all in one place in the qml instead of spread out everywhere ![image](https://user-images.githubusercontent.com/169252/233662482-d5638931-d4af-45db-bad0-4e235360bebd.png) ![image](https://user-images.githubusercontent.com/169252/233662566-fe9fde2f-5c9b-48d5-ae6a-54cdf93e8147.png)

Re-do of [661](https://github.com/nomic-ai/gpt4all/pull/661) - leaving as Draft until building/linking issues are solved Improves output quality by making these tokenizers more closely match the behavior of the huggingface `tokenizers` based BPE...

### Feature request A straightforward ggml implementation of a model allocates enough memory to hold the activations from *all the model layers at once.* Outside of explicitly asking it to,...

https://github.com/ggerganov/ggml/issues/217 adapted from gpt-neox example and work started in https://github.com/ggerganov/llama.cpp/issues/1602 only supports 7b right now - 40b multiquery attention gets hairier, as its 128 query heads with 8 k and...

noticed this and initially thought it was a difference between q4_k and q4_0, but its just that smaller models require higher `ctx-size` to break - it appears to be poorly...

This is mostly to fix https://github.com/allusion-app/Allusion/issues/448 but also speeds thumbnail generation up quite a bit - I know this was [tried before](https://github.com/allusion-app/Allusion/pull/365) and ran into build issues. I looked for...