Incccc

Results 20 comments of Incccc

Even though,FT has not supported int8 quantization for popular LLMs,like bloom and so on quantizated to 4-bit not indeed accelerate models,it depends on your hardware computes, and the model accuracy...

> > Even though,FT has not supported int8 quantization for popular LLMs,like bloom and so on quantizated to 4-bit not indeed accelerate models,it depends on your hardware computes, and the...

嗯嗯,请问下有没有那种没有外网环境下,插件都已经下载好的离线安装包呐,可以offline安装的

> Has this issue being solved ??? Pls kindly reply yeah,the implementation of Batchnorm layer of different versions of caffe matters,you need to check your caffe's version,for example,Nvidia's caffe implement...

thx for your reply, does this pr have been completely implemented yet, support quant now? since the size of the model is 25B, A10 gpu is not able to afford...

actually InternVL-Chat-V1.5 is a 25B model, it need about 46.5G mem to load the model as fp16 format, btw i thought InternVL-Chat-V1.5 and LMdeploy are both from shanghai ai lab...

应该是个bug,按照提示变量前面加上下划线可以编过

@horseee Has this been supported yet? can we prune the bloom-series model now? also I hava a question: how can users automaticly to set the params like block_mlp_layer_start and block_mlp_layer_end...

@horseee hi, I have two questions hope you could reply,thx: 1. does the model pruned by llm-pruner or other pruner tricks, could have a better inference performance under fp16; 2....

@horseee hi,thx for your kind reply. Actually I'm not intend to compare the performance of pruner and quantization, as they are two different ways to compress the model.I mean how...