felixstander
felixstander
使用transformers的trainer进行训练的时候,当不开fp16=True的时候,是可以train起来的,但是跑着跑着loss会变成0。我认为是在使用prepare model for int8 training的时候需要调整layer norm的数据类型到float32,为了训练的稳定性。而在调整layernorm到float32后,就会出现expected scalar type half but found float。(bloom里的LayerNorm叫layernorm而不是layer_norm)
顺便想问问作者是使用8卡A100 40G进行zero stage2的finetune吗?显存的占用是怎么样的呀?想问问8卡 v100 16G有可能跑起来吗
any update on this issue? I also face the same problem
这个有官方的答复吗,能调用各种工具的能力很是刚需
also facing the slow "where" filtering problem. I got around 50k data in one collection.
can support the Alibaba open-source Qwen model will be wonderful
> Hey @KrisWongz @felixstander, #103 should add support for Qwen. The base model appears to generate results consistent with the example on Huggingface Hub. Do you have an adapter I...
Does Lorax support Qwen-4bit-gptq version without the need of flash attention v2? As far as I can see, All models you support now are built on top of flash attention...