KUNPENG GUO comments

Results 6 comments of


                                            KUNPENG GUO

Fix multiform bug

Someone has updates on this PR? It would be great if it's merged to the main code base..

How to fine tune vicuna-7b with A40

``` FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. ``` This might be related.. got this...

Adapter Support For the Longformer family

mentioned in #442

[Bug]: Possible GPU Memory Utilization issue/bug for embeddings model

Hey @DarkLight1337 , Can we put a up limit argument to configure up to how much the server process can grab the vram to avoid surprises in the deployments? Or...

[Bug]: Possible GPU Memory Utilization issue/bug for embeddings model

Hey @DarkLight1337 , > You can set `--gpu-memory-utilization` to cap the GPU memory usage 1) That won't work for the encoder-based embedder 2) It is in fact considering only the...

[Bug]: Possible GPU Memory Utilization issue/bug for embeddings model

Update: currently if deploys [bge-model](https://huggingface.co/BAAI/bge-large-en-v1.5), the memory grows... it breaks the server with OOM from time to time, so we have to restart it.