Shuchang Zhou comments

Results 24 comments of


                                            Shuchang Zhou

will support lower bit quantization?

Here is an example of 4-bit imagenet model https://github.com/megvii-research/Sparsebit/blob/main/examples/imagenet_qat/README.md It might be straightforward to extend to 2-bit. Please give it a try!

LLaMA Implementation

> > does this work with int8? > > No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in...

LLaMA Implementation

@zphang I'm not able to get something like `tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/")` to work. Is this intentional or just leaving AutoTokenizer for future work?

This is OK: `tokenizer = transformers.LLaMATokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/")` If using `tokenizer = AutoTokenizer.from_pretrained("/data/llama/hf/7b/tokenizer/"` then it will complain no "config.json". ``` OSError: /data/llama/hf/7b/tokenizer/ does not appear to have a file named config.json. Checkout...

LLaMA Implementation

> > does this work with int8? > > No idea! I haven't messed with int8 too much myself. It ought to be compatible with whatever is already supported in...

LLaMA Implementation

> Has anyone tested loading 65B with `accelerate` to load on multiple GPUs? ||fp16|int8(bitsandbytes)| |--|--|--| |V100|OK, 5xV100|Bad results, short generated sequences| |A100|OK, 6xA100 when using "auto"|OK, 3xA100| Yes, I currently...

Make all Transformer models compatible with model parallelism

Indeed, this fix is required for BLOOM. https://github.com/huggingface/transformers/compare/main...zsc:transformers:main (my fix is hacky and not PR-ready. Just FYI)

Releasing Alpaca 30B adapters

@aspctu How many and what GPUs did you use to run the model inference? For the smaller alpaca13b I kept getting CUDA OOM despite much effort to tweak device_map .

Releasing Alpaca 30B adapters

I also encountered this mysterious "'NoneType' object has no attribute 'device'" bug. My solution is to use [export_hf_checkpoint.py](https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py) to convert the base+LoRA model to a vanilla model, and then use...

Releasing Alpaca 30B adapters

@nenkoru It's really as straight forward as replacing the two digits "7" with "30" in `export_hf_checkpoint.py`. And when you get the `./hf_ckpt`, just point your `model.from_pretrained` to that directory to...