Zixi Liu
Zixi Liu
I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue: > python make_datafiles.py ./cnn/stories ./dailymail/stories/ Making bin file for URLs listed in url_lists/all_test.txt... Traceback...
### 🐛 Describe the bug pytorch 1.9+cuda11.1+Nvidia A30 GPU after install successfully, run `colossalai check -i` show the error below. ``` Traceback (most recent call last): File "/home/liuzixi01/.conda/envs/torch-cuda11/bin/colossalai", line 5,...
我采用你们的模型进行微调,微调的数据是小说数据,尝试了两种输入结构,训练loss都是从8.8开始下降,valid ppl非常大。第一种是每行的结构为:{"prompt": 随机截取的小说文本,长度为256, "text": 一本小说删除换行符后拼接,长度约为10w字},第二种是每行结构为:{"prompt": text的前文,长度为256, "text": 把一本小说按512长度分段}。请问那种输入结构是正确的呢?我从代码来看你们用的应该是第一种,但是为什么loss还是这么大呢?
I download the file from your google drive but I cannot find raw training data. Or could you provide your preprocess method in detail?
输入输出格式问题
因为发现在上一个同样问题里面问了没有回答,所以再开一个新的issue。 求问作者,你预训练的时候应该没有调用T5原生的_shift_right函数,是吗?因为如果调用了的话,解码端实际的开头是[PAD];没调用的话,解码端预训练的模式是:输入[CLS]XXXXXXXX,输出XXXXXXX[SEP]。这样理解对吧?
### 🐛 Describe the bug 使用huggingface的模型接口,加载glm,按照gpt2示例的方式运行训练代码,数据集合采用自建数据集,数据集的运行模式如下,仅仅修改了https://github.com/hpcaitech/ColossalAI/blob/dbc01b9c0479a6fd3fb04450b9dc01b5162d8c0d/examples/language/gpt/gemini/train_gpt_demo.py#L342这行代码,train_step 函数并无修改。 运行脚本如下: ``` set -x # distplan in ["CAI_ZeRO1", "CAI_ZeRO2", "CAI_Gemini", "Pytorch_DDP", "Pytorch_ZeRO"] export DISTPLAN=${DISTPLAN:-"CAI_Gemini"} # The following options only valid when DISTPLAN="colossalai" export...
### 🐛 Describe the bug 使用huggingface的模型接口,用gpt2的gemini demo跑glm-chinese-10b,修改了tensor_parallel函数和glm的modeling_glm.py来适配colossal,发现layer norm会报RuntimeError: Expected weight to be of same shape as normalized_shape, but got weight of shape [2048] and normalized_shape = [4096] 运行脚本如下: ``` set...
目前采用你们main branch的[merge_llama_with_chinese_lora_to_hf.py](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/merge_llama_with_chinese_lora_to_hf.py)代码直接转换hf llama 转换过程是首先运行hf的main branch,版本是[151425d](https://github.com/huggingface/transformers/commit/151425ddb29d4ad1a121e8cce62000a2ac52d3ba),转换原始llama模型到hf模型。然后运行merge_llama_with_chinese_lora_to_hf.py合并你们在hf中公开的权重,得到最终模型。 我的hf llama 13B的权重哈希值是 ``` sha256sum pytorch_model-00001-of-00003.bin dd20cdee2637408c6ab88c13f5c27d153bacd0e99f2d55f6a66fbd0269944436 pytorch_model-00001-of-00003.bin sha256sum pytorch_model-00002-of-00003.bin 1aba886c5f28d2e2e9b04ab3c4f5bc250c6b06efc1baa3f557677b3097f70e6a pytorch_model-00002-of-00003.bin sha256sum pytorch_model-00003-of-00003.bin 2efc56cddb7877c830bc5b402ee72996aedb5694c9e8007bf1d52d72c0d97d26 pytorch_model-00003-of-00003.bin ``` 使用merge_llama_with_chinese_lora_to_hf.py转换后得到的权重哈希值为 ``` sha256sum pytorch_model-00001-of-00003.bin 7b15487829d195378000b331f4f5a9b3c5b45af8ef68caf6b7e625b44f94a958 pytorch_model-00001-of-00003.bin sha256sum pytorch_model-00002-of-00003.bin...
decapoda-research/llama-7b-hf has been reported that the weight converted is not suitable to current transformers, please give an exact transformer version.