tu2022

Results 4 comments of tu2022

还有个问题想问一下,你们这个预训练时,每个iteration时训练global batch size条数据吗?训练的每一条数据是截止至1024长度的文章,还是一整篇文章,文章被切割成一句一句,每一句padding到1024?

> run_pretrain_bart.sh中的参数设置似乎是针对base版本的,我按照large版本进行了参数修改,得到的模型经过convert_ckpt.py处理之后在模型load时还是会报错 想问一下你用barge版本进行参数修改的时候,seq-length设置是多少,需要和max-position-embedding对齐吗?

And I test another situation: one pieces of data ( content length of 4070) cycling for 493 times, the result is that 13111MiB and 277.93s for quantized model, 18611 MiB...

By using the vLLM with AWQ models, the speed did faster than with exllama2, but still slower than the unquantized model using vLLM. It seems the quantization didn't work, very...