hjc3613 issues

Results 10 issues of


                                            hjc3613

predict result repeat the same result

train on 30000000 parallel english and chinese sentences, after trained 100000 steps, the predit result: ![image](https://user-images.githubusercontent.com/37894838/148315788-ed26c456-d86f-4b21-b2d3-24e6481db383.png) content of config.yaml: # wmt14_en_de.yaml save_data: data/wmt/run/example # Corpus opts: data: corpus_1: path_src: corpus/corpus_1/en-zh.en.bpe...

nyt10 dataset

您好，请问能提供nyt10数据集的json格式下载连接吗？官网下的都是pb文件

Can't find adapter_config.json

微调后，没有生成adaptor_config.json文件，也没有config.json文件，所以在合并时，报错了：Can't find adapter_config.json - [√ ] **基础模型**： Alpaca-Plus - [ √] **运行系统**： Linux - [ √] **问题分类**：模型转换和合并 / 模型训练与精调 / - [ √] （必选）由于相关依赖频繁更新，请确保按照[Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki)中的相关步骤执行 - [ √] （必选）我已阅读[FAQ章节](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/常见问题)并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 - [...

weight must be 2-D error

### 详细描述问题执行 run_sft.sh时，报错： ![lADPJw1WXOIi_t7NC7jND6A_4000_3000](https://github.com/ymcui/Chinese-LLaMA-Alpaca/assets/37894838/f44829b7-59cd-4b8d-a9da-c74af5e0566c) 注：我用的deepspeed stage 3，配置项： ![lADPD3Ir4Ac3YyfNBgDNCAA_2048_1536](https://github.com/ymcui/Chinese-LLaMA-Alpaca/assets/37894838/44dd6efb-f6f5-4173-81db-9b1cee17bc7f) run_sft.sh脚本内容为： ![lADPKHCb1T-6iuXNBgDNCAA_2048_1536](https://github.com/ymcui/Chinese-LLaMA-Alpaca/assets/37894838/bfe4e502-8fc6-4c5c-9d56-c8b75f8ed32d) run_sft.sh与原始内容的差异在于我去掉了peft_path，让模型重新初始化，其它的基本没变动 ### 必查项目（前三项只保留你要问的） - [ √] **基础模型**：Alpaca-Plus - [ √] **运行系统**：Linux - [ √] **问题分类**：模型训练与精调 - [ √]...

freezing layers have differenct behaves for different models

### System Info Dockerfile: ![image](https://github.com/meta-llama/llama-recipes/assets/37894838/5597778d-23b0-49ae-9e9a-05563d38a771) ### Information - [ ] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug when freezing the top...

triaged

does deepspeed support pure bf16 training?

**training 70B cost too large gpu memory** when training 70+B model, the cost of gpu memory is too large, as mentioned in the deepspeed deepspeed-readthedocs-io-en-stable.pdf, the total memory was about...

enhancement

trice details

could you please provide details about the implementation of the trice algorithm? the code in https://github.com/google-research/cascades/blob/main/cascades/examples/notebooks/trice.ipynb has few functions unimplemented, I want to refer back to the original implementation, thanks...

hjc3613