Johannes Vass
Johannes Vass
I just read about SGLang's approach for constrained decoding. Did you consider adding that to VLLM instead of Outlines? See for example this blog article: https://lmsys.org/blog/2024-02-05-compressed-fsm/
I also have problems with a memory leak with vllm 0.2.7. For me it's not limited to Ray but also concerns the API server itself, no matter whether I use...
For now my workaround is to set a memory limit and restart vllm automatically after OOM.
> I have a similar question that might be related to it. I see that it's not possible (at least via GUI) to remove files at document set/connector level (hence,...
Which exact settings of the `GEN_AI_` variables did you try? For me the following works with a self-hosted Huggingface TGI: ``` GEN_AI_MODEL_VERSION="" GEN_AI_MODEL_PROVIDER="huggingface" HUGGINGFACE_API_BASE="https://xyz" GEN_AI_API_ENDPOINT="https://xyz" ``` Disclaimer: I am unsure...
@mad-mikey can you already foresee when you will be able to contribute this?
There is already another issue regarding this: #984
``` In [1]: import DeepInstruments Using TensorFlow backend. --------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 import DeepInstruments /Users/johannesvass/ownCloud/Studium/2017S_Bachelorarbeit/ismir2016/DeepInstruments/__init__.py in () 51 import DeepInstruments.audio 52 import DeepInstruments.descriptors...
### Problem Analysis The issue seems to be a breaking change in the `tokenizers` library (probably https://github.com/huggingface/tokenizers/pull/1476) which prevents an XLM-Roberta tokenizer saved with a version >= `0.19.0` to be...