Incccc comments

Results 20 comments of


                                            Incccc

# feature request # GPT-Q 4 bit support

Even though,FT has not supported int8 quantization for popular LLMs,like bloom and so on quantizated to 4-bit not indeed accelerate models,it depends on your hardware computes, and the model accuracy...

# feature request # GPT-Q 4 bit support

> > Even though,FT has not supported int8 quantization for popular LLMs,like bloom and so on quantizated to 4-bit not indeed accelerate models,it depends on your hardware computes, and the...

下载打包好插件的版本初始化报错

嗯嗯，请问下有没有那种没有外网环境下，插件都已经下载好的离线安装包呐，可以offline安装的

incompatible number of blobs for layer bn0

> Has this issue being solved ??? Pls kindly reply yeah,the implementation of Batchnorm layer of different versions of caffe matters,you need to check your caffe's version,for example,Nvidia's caffe implement...

[Feature] InternVL-Chat-V1.5 Support

thx for your reply, does this pr have been completely implemented yet, support quant now? since the size of the model is 25B, A10 gpu is not able to afford...

[Feature] InternVL-Chat-V1.5 Support

actually InternVL-Chat-V1.5 is a 25B model, it need about 46.5G mem to load the model as fp16 format, btw i thought InternVL-Chat-V1.5 and LMdeploy are both from shanghai ai lab...

armlinux编译报错

应该是个bug，按照提示变量前面加上下划线可以编过

this method can be used for bloom?

@horseee Has this been supported yet? can we prune the bloom-series model now? also I hava a question: how can users automaticly to set the params like block_mlp_layer_start and block_mlp_layer_end...

Adding quantization

@horseee hi, I have two questions hope you could reply,thx: 1. does the model pruned by llm-pruner or other pruner tricks, could have a better inference performance under fp16; 2....

Adding quantization

@horseee hi,thx for your kind reply. Actually I'm not intend to compare the performance of pruner and quantization, as they are two different ways to compress the model.I mean how...