Zixi Liu
Zixi Liu
@littlesulley 你好我也遇到了一模一样的问题,请问你们会不会在近期改进代码呢? @JaeZheng 我也遇到了一样的问题,请问你这边解决了吗?
求问作者,你预训练的时候应该没有调用T5原生的_shift_right函数,是吗?因为如果调用了的话,解码端实际的开头是[PAD];没调用的话,解码端预训练的模式是:输入[CLS]XXXXXXXX,输出XXXXXXX[SEP]。这样理解对吧? > * 是以[CLS]开头,[SEP]结尾的 > * 数据量足够可能影响不大。但是不建议直接设置了SOS,EOS,用预训练模型就该保持设置一直
> 所以你融合的是什么模型?LLaMA-13b还是Alpaca-13b? 融合的是alpaca模型
> 我这边llama是从meta那边下载的,中文的alpaca权重是从hf下载的,哈希都没问题
> > 我这边llama是从meta那边下载的,中文的alpaca权重是从hf下载的,哈希都没问题 > > 因为peft变动比较大,大多数情况下是peft的问题;建议更新peft,使用新的合并脚本再试一下。 我看了别的issue,用的是0.2.0的peft,所以感觉还是需要你们公开一下对应的peft版本和哈希值,不然真的测试不了你们的效果
``` /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu:329:345: required from here /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/colossalai/kernel/cuda_native/csrc/multi_tensor_apply.cuh:104:150: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations] multi_tensor_apply_kernel( ^ /home/liuzixi01/.conda/envs/torch-cuda116/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:235:1: note: declared here...
> I resolve this problem by reinstall and i will close the issue, thx
tensor_parallelize修改如下: ``` def tensor_parallelize(model: torch.nn.Module, pg: ProcessGroup): """tensor_parallelize Sharding the Model Parameters. Args: model (torch.nn.Module): a torch module to be sharded """ for mn, module in model.named_modules(): if mn=='': continue...
I load my model in this manner ``` with ColoInitContext(device=get_current_device(), dtype=torch.half, default_dist_spec=default_dist_spec, default_pg=shard_pg): # model = model_builder(args.model_type)(checkpoint=True) model = GLMForConditionalGeneration.from_pretrained('THUDM/glm-10b-chinese', trust_remote_code=True) tp_pg = ProcessGroup(tp_degree=2) tensor_parallelize(model, tp_pg) ```