Quantization

Open IceTea42 opened this issue 1 year ago • 1 comments

Hi, thank you for this work. How to quantize it to use int8? Any comments are appreciated.

May 07 '24 12:05 IceTea42

You can quantize our model at run time using bitsandbyte. Please refer to the steps below.

Install bitsandbyte

pip install bitsandbyte

Modify the vlmeval/vlm/hpt.pt file as follows.

from transformers import BitsAndBytesConfig

# this part should go at line 53
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,)

llm = AutoModelForCausalLM.from_pretrained(global_model_path,
    quantization_config=quant_config,
    subfolder = 'llm',
    trust_remote_code=True,
    use_safetensors=True,
    torch_dtype=torch_dtype,
    device_map='cuda')
self.llm = llm

Then you can proceed to run the demo

python demo/demo.py --image_path demo/einstein.jpg  --text 'Question: What is unusual about this image?\nAnswer:'  --model hpt-air-demo-local

This is the output on my end.

The unusual aspect of this image is the depiction of a famous scientist, such as Albert Einstein, holding a cell phone. It is not common to see a renowned scientist using a cell phone, as they are often associated with more traditional communication methods or research. The painting or drawing of Einstein holding a cell phone is an interesting and unexpected representation of the scientist's daily life or activities.

May 10 '24 09:05 QuangHyperGAI