HPT
HPT copied to clipboard
Quantization
Hi, thank you for this work. How to quantize it to use int8? Any comments are appreciated.
You can quantize our model at run time using bitsandbyte. Please refer to the steps below.
- Install bitsandbyte
pip install bitsandbyte
- Modify the
vlmeval/vlm/hpt.ptfile as follows.
from transformers import BitsAndBytesConfig
# this part should go at line 53
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,)
llm = AutoModelForCausalLM.from_pretrained(global_model_path,
quantization_config=quant_config,
subfolder = 'llm',
trust_remote_code=True,
use_safetensors=True,
torch_dtype=torch_dtype,
device_map='cuda')
self.llm = llm
- Then you can proceed to run the demo
python demo/demo.py --image_path demo/einstein.jpg --text 'Question: What is unusual about this image?\nAnswer:' --model hpt-air-demo-local
This is the output on my end.
The unusual aspect of this image is the depiction of a famous scientist, such as Albert Einstein, holding a cell phone. It is not common to see a renowned scientist using a cell phone, as they are often associated with more traditional communication methods or research. The painting or drawing of Einstein holding a cell phone is an interesting and unexpected representation of the scientist's daily life or activities.