Unkn0wn comments

Results 9 comments of


                                            Unkn0wn

PanicException For Result::unwarp()

The dataset is a Chinese text dataset with 26GB size

PanicException For Result::unwarp()

Thank you for your response. I will train the tokenizer with SentencePiece and convert the model to a T5Tokenzier. Thank you!

这里是不是应该调用cls.momentumPolyAttn()？

都可以, 不同于上一步attend 到不同的pseudo sentence学习一个新的representation, 这里attn layer其实起的作用更多是mapping back到尽量相同的空间里做contrastive learning, 单这一层更多是起reshape的作用, 所以我当时直接复用了.

reproduce 脚本有错误

> 这一行末尾`\`前少了个空格，导致参数读取有问题，请修正： > > https://github.com/Namco0816/PT-BERT/blob/dba403c01aa2acdf8659b7a7167d41471aacb656/scripts/re-produce_result.sh#L12 多谢！最近在休假，手上没有能修改的设备。会在休假结束后第一时间修改。

Reproduce the results

I've already uploaded all the code in this repo. I will add a readme file once I finish the work I am now WIP.

请问此项目transformers依赖的是什么版本呢

hello 整个主体代码是基于SimCSE的代码修改得到的。所以用SimCSE的环境就可以直接跑起来

Thanks for you contribution! I've also implement a version of GRPO trainer, instead of using a for loop in https://github.com/saisurbehera/trl/blob/grpo/trl/trainer/grpo_trainer.py#L380, I directly view it to (-1, sampling_group_size) and calculate the...

OOM when unwrap_model_for_generation

It seems like that I encountered the same issue. I also use a dummy reward model which do not take any GPU memory. And the training goes smoothly at the...