Shuchang Zhou
Shuchang Zhou
Here is an example of 4-bit imagenet model https://github.com/megvii-research/Sparsebit/blob/main/examples/imagenet_qat/README.md It might be straightforward to extend to 2-bit. Please give it a try!
> > does this work with int8? > > No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in...
@zphang I'm not able to get something like `tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/")` to work. Is this intentional or just leaving AutoTokenizer for future work?
This is OK: `tokenizer = transformers.LLaMATokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/")` If using `tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/"` then it will complain no "config.json". ``` OSError: /data/llama/hf/7b/tokenizer/ does not appear to have a file named config.json. Checkout...
> > does this work with int8? > > No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in...
> Has anyone tested loading 65B with `accelerate` to load on multiple GPUs? ||fp16|int8(bitsandbytes)| |--|--|--| |V100|OK, 5xV100|Bad results, short generated sequences| |A100|OK, 6xA100 when using "auto"|OK, 3xA100| Yes, I currently...
Indeed, this fix is required for BLOOM. https://github.com/huggingface/transformers/compare/main...zsc:transformers:main (my fix is hacky and not PR-ready. Just FYI)
@aspctu How many and what GPUs did you use to run the model inference? For the smaller alpaca13b I kept getting CUDA OOM despite much effort to tweak device_map .
I also encountered this mysterious "'NoneType' object has no attribute 'device'" bug. My solution is to use [export_hf_checkpoint.py](https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py) to convert the base+LoRA model to a vanilla model, and then use...
@nenkoru It's really as straight forward as replacing the two digits "7" with "30" in `export_hf_checkpoint.py`. And when you get the `./hf_ckpt`, just point your `model.from_pretrained` to that directory to...