Zixi Liu issues

Results 11 issues of


                                            Zixi Liu

make_datafiles.py issue

I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue: > python make_datafiles.py ./cnn/stories ./dailymail/stories/ Making bin file for URLs listed in url_lists/all_test.txt... Traceback...

[BUG]: colossalai check error

### 🐛 Describe the bug pytorch 1.9+cuda11.1+Nvidia A30 GPU after install successfully, run `colossalai check -i` show the error below. ``` Traceback (most recent call last): File "/home/liuzixi01/.conda/envs/torch-cuda11/bin/colossalai", line 5,...

bug

微调loss特别大

我采用你们的模型进行微调，微调的数据是小说数据，尝试了两种输入结构，训练loss都是从8.8开始下降，valid ppl非常大。第一种是每行的结构为：{"prompt": 随机截取的小说文本，长度为256, "text": 一本小说删除换行符后拼接，长度约为10w字}，第二种是每行结构为：{"prompt": text的前文，长度为256, "text": 把一本小说按512长度分段}。请问那种输入结构是正确的呢？我从代码来看你们用的应该是第一种，但是为什么loss还是这么大呢？

Could you provide training and validating data?

I download the file from your google drive but I cannot find raw training data. Or could you provide your preprocess method in detail?

输入输出格式问题

因为发现在上一个同样问题里面问了没有回答，所以再开一个新的issue。求问作者，你预训练的时候应该没有调用T5原生的_shift_right函数，是吗？因为如果调用了的话，解码端实际的开头是[PAD]；没调用的话，解码端预训练的模式是：输入[CLS]XXXXXXXX，输出XXXXXXX[SEP]。这样理解对吧？

when will you upload the videos?

[BUG]: use gpt2 example for custom model and dataset, report torch.distributed error.

### 🐛 Describe the bug 使用huggingface的模型接口，加载glm，按照gpt2示例的方式运行训练代码，数据集合采用自建数据集，数据集的运行模式如下，仅仅修改了https://github.com/hpcaitech/ColossalAI/blob/dbc01b9c0479a6fd3fb04450b9dc01b5162d8c0d/examples/language/gpt/gemini/train_gpt_demo.py#L342这行代码，train_step 函数并无修改。运行脚本如下： ``` set -x # distplan in ["CAI_ZeRO1", "CAI_ZeRO2", "CAI_Gemini", "Pytorch_DDP", "Pytorch_ZeRO"] export DISTPLAN=${DISTPLAN:-"CAI_Gemini"} # The following options only valid when DISTPLAN="colossalai" export...

bug

[BUG]: layer norm error

### 🐛 Describe the bug 使用huggingface的模型接口，用gpt2的gemini demo跑glm-chinese-10b，修改了tensor_parallel函数和glm的modeling_glm.py来适配colossal，发现layer norm会报RuntimeError: Expected weight to be of same shape as normalized_shape, but got weight of shape [2048] and normalized_shape = [4096] 运行脚本如下： ``` set...

bug

hf转换后的13B model无法达到公布性能，求公开模型hf格式的模型权重哈希值以及transformer版本

目前采用你们main branch的[merge_llama_with_chinese_lora_to_hf.py](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/merge_llama_with_chinese_lora_to_hf.py)代码直接转换hf llama 转换过程是首先运行hf的main branch，版本是[151425d](https://github.com/huggingface/transformers/commit/151425ddb29d4ad1a121e8cce62000a2ac52d3ba)，转换原始llama模型到hf模型。然后运行merge_llama_with_chinese_lora_to_hf.py合并你们在hf中公开的权重，得到最终模型。我的hf llama 13B的权重哈希值是 ``` sha256sum pytorch_model-00001-of-00003.bin dd20cdee2637408c6ab88c13f5c27d153bacd0e99f2d55f6a66fbd0269944436 pytorch_model-00001-of-00003.bin sha256sum pytorch_model-00002-of-00003.bin 1aba886c5f28d2e2e9b04ab3c4f5bc250c6b06efc1baa3f557677b3097f70e6a pytorch_model-00002-of-00003.bin sha256sum pytorch_model-00003-of-00003.bin 2efc56cddb7877c830bc5b402ee72996aedb5694c9e8007bf1d52d72c0d97d26 pytorch_model-00003-of-00003.bin ``` 使用merge_llama_with_chinese_lora_to_hf.py转换后得到的权重哈希值为 ``` sha256sum pytorch_model-00001-of-00003.bin 7b15487829d195378000b331f4f5a9b3c5b45af8ef68caf6b7e625b44f94a958 pytorch_model-00001-of-00003.bin sha256sum pytorch_model-00002-of-00003.bin...

stale

what exact version of transformers do you use?

decapoda-research/llama-7b-hf has been reported that the weight converted is not suitable to current transformers, please give an exact transformer version.