cduk comments

Results 18 comments of


                                            cduk

Handle larger Indices (current limit 1GB)

Just wondering if there were any developments on this front. I guess my use case is a simple one as I have around 3-4 million embeddings to index which is...

[Bug]: Exception: Error while deserializing header: HeaderTooLarge

I had the same problem. Used git to download files and repo used LFS which meant files contained only pointers to the real filess. A quick view of the file...

GPU support Table & VRAM usage

> For the sake of convenience (2x less download size/RAM/VRAM), I've uploaded 16-bit versions of tuned models to HF Hub: https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-7b-16bit https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit Would you mind showing how you made the...

GPU support Table & VRAM usage

@antheas So close! Have you considered quantizing to 8-bit and seeing how well that works? I wonder whether 8bit 7B would out-perform fp16 3B. Both seem like they would fit...

Anyway to make chat async?

I don't know if this could be helpful: https://github.com/tpope/vim-dispatch they seem to have async stuff running in the background. For LLMs it is maybe more complex as it impacts the...

Dynamic loading - different models at request time / multiple models

The simpler way would be not do deal with loading and unloading and require all models fit in VRAM and then you select which one you use in the API...

Dynamic loading - different models at request time / multiple models

Exactly!

Training Run - New Tokenizer

What changes did you plan to make with the tokenizer?

We hope to add an interface that can obtain the file list by volume ID.

From google translate: >When we started using it in 2017, we only used master and volume. Recently, we wanted to reorganize the files within the file system. However, we don't...

[Bug]: FP8 Marlin fallback out of memory regression

I used a RTX 3090 (24GB VRAM). I will quantize offline, this is anyway more efficient.