Berat Çimen
Berat Çimen
Langchain integration is easier than it looks. You can add `vllm.LLM` as a custom LLM. [Doc link](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/custom_llm)
GPT-J and GPT-NeoX have similar architectures. Maybe a hacky solution is possible as of now.
> Are you wanting `load_in_8bit` from HF or would you consider the AWQ GPTQ support sufficient? @hmellor cloud compute costs adds for quantizing models to AWQ and GPTQ so having...
@hmellor does models quantized by BnB and uploaded to hub work with vLLM?
Same issue on `OS Build 22631.3296`
is there any news about this issue?
Thank you both @charliermarsh and @zanieb! Setting the env variable `UV_CACHE_DIR = "E:\uv\cache"` worked! Here are the terminal outputs: ```bash [08:21:02] berat /e/Projects/python/uv_test>uv --help An extremely fast Python package manager....
Thanks @Krobys for the PR. Absolute legend!
hey @nbonamy i have a similar issue with OpenAI Agents SDK with OpenRouter. Can you give me a hand via telling how you implemented the fix?