彭齐荣

Results 3 issues of 彭齐荣

20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_3 locally 20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_0 locally 20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_1 locally 20/11/12 15:21:17 INFO BlockManager: Found block...

8张 V100 显卡,开启 Zero3,TP=1,PP=1,DP=8,LlamaForCausalLM.from_pretrained llama 70B 模型会出现 OOM (内存不够,不是显存不够),物理内存 512GB。 原因是 dev 分支中,base.py 304行, state_dict = {} if not is_zero3_enabled(config) or env.dp_rank == 0 \ or config.low_cpu_mem_usage or config.quantization_config.load_in_8bit \...

bug

使用最新 dev分支代码训练 llama2 70B ,存在以下问题: │collie/collie/models/llama/model.py:203 in _forward │ │ │ │ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │ │ 201 │...