deufs
Results
2
issues of
deufs
fp16 kv cache would greatly reduce memory usage without significant loss of quality.
``` ! cd multi_token && python scripts/serve_model.py \ --model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \ --model_lora_path sshh12/Mistral-7B-LoRA-AudioCLAP \ --port 7860 ``` ``` 2024-02-25 00:08:32.729122: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register...