Ming Lin
Ming Lin
Hello! I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM: ``` python examples/llm_engine_example.py --dtype float16 --model ~/models/${model_name}-squeezellm/packed_weight...
This is a really nice work! I followed the instruction to quantize Llama-2-7b-chat-hf. At kmeans clustering step, I ran the following command: ``` python nuq.py --bit 4 --model_type llama --model...
# Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. Discussion thread: [Enable OpenAI Built-in...