Oxi84 comments

Results 62 comments of


                                            Oxi84

Cannot load it with T5 - RTX 5000, Cuda 11.3

On other cards it works well?

Memory Decreases! But Latency Increases....

Good to know. You are doing great job. So is it now faster or slower than fp16 for GPT-J case? I will try in few days myself. So far i...

Memory Decreases! But Latency Increases....

For me it takes around 250 seconds to generate 1000 words on RTX 3090, when using 8bit without ,int8_threshold=0. When using ,int8_threshold=0, the generation time is 88 seconds. For 500...

Memory Decreases! But Latency Increases....

It is awesome you made this. Chinese GLM even works on 4 bits. https://github.com/THUDM/GLM-130B It seem to be the best language model so far.

4bit inference is slow

Yes, this one is pretty fast around 2x faster in 4bits that fp16. But faster qlora will be better as it supports most models available. With GPTQ you can pretty...

OnnxT5 slower than Pytorch

For me the same thing, it is slower around 10 percent, i run batch size around 10-15 beam size is 4 and sequence lenght is on average 15-20. Probably the...

OnnxT5 slower than Pytorch

I tried on another CPU and now it is 2x slower (without quantisation) than Pytorch with the same settings as above: i run batch size around 10-15 beam size is...

OnnxT5 slower than Pytorch

It does wok faster when using smaller batches and when using less cores. It is probably optimal to divide all cpu cores using pytorch thread number set and then use...

(Not So) Bad words list for text generation

> custom logits processors > @iiglesias-asapp I see your point - controlling at a token level may be advantageous. Nevertheless, i) without a specific common use case in mind and...

Official guide for classification task is False, it reports an error

It worked when I used the notebook ( https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/pytorch/sequence_classification.ipynb) tha goes along with the text - seems like it is updated or simply I made some mistakes when copy pasting...