schicksa comments

Results 3 comments of


                                            schicksa

(ITA)How to concat train dataset and whether the visual context is also adopted on the dev dataset or test dataset

@wangxinyu0922 @257556227 I also have a question about eos token, which often used in generative frameworks such as GPT, and it passes through the Tokenizer.convert_ tokens_ to_ ids of bert...

(ITA)How to concat train dataset and whether the visual context is also adopted on the dev dataset or test dataset

你可能没懂我的意思，我的意思是EOS token有什么作用，如果用Roberta的分词器的话，segment_id应该都是0，那么，采用EOS token分开的意义不就只是个象征作用，无实际意义吗代码中EOS token暂时只找到这两处 ``` if '' in words: eos_id = words.index('') current_sentence.chunk_sentence(0,eos_id) #1 tokens = tokenizer.tokenize(s + "") #2 ``` 前者是将第一个token前的句子取出来，也就是UMT中的data 后者就是测试用，一个句子加进行tokenize的结果所以，我认为应该只是让模型在训练的时候学到它们是分隔符来着。 **对了，还有个问题想问下，Cross-view-alignment是怎么实现的啊，forward-backword-score这部分嵌套有点多，没看太懂**

(ITA)How to concat train dataset and whether the visual context is also adopted on the dev dataset or test dataset

> > 你可能没懂我的意思，我的意思是EOS token有什么作用，如果用Roberta的分词器的话，segment_id应该都是0，那么，采用EOS token分开的意义不就只是个象征作用，无实际意义吗代码中EOS token暂时只找到这两处 > > ``` > > if '' in words: > > eos_id = words.index('') > > current_sentence.chunk_sentence(0,eos_id) #1 > > > > tokens...