GeneZC

Results 60 comments of GeneZC

While we have built tokenizer for test set, the corresponding embeddings will be randomly initialized for words that do not exist in the pre-trained word embedding. That is, if a...

暂时没有,我们为了保持`train.py`的结构而没有引入文中提到的对损失函数的更多约束。 不过我们发现缺失了这一部分后的效果相较于文中提到的实验结果准确率相差在1%左右。

应该自己重写一个损失函数,替代原来在`train.py`的损失函数,即criterion

感觉只有两个超参比较敏感。batch_size=64 learning_rate=0.001

It seems that the dependency matrix has not been processed properly. Could you please give the details of your example?

Maybe I will consider releasing the model parameters based on my own experiments later.

Hi, this repo concentrates on the situation that the aspect terms are given. If you would like to do it without provided aspect terms, you could refer to end to...

@ZhangYikaii 按理说只要seq_len是所有数据的最大长度应该就没问题?

我发现这个也提了类似的错误 #114

我个人分析应该是这样的: 所有文本的最长长度为83(举个例子) 而本仓库设置的文本最长长度为80,所以会有原文本被阶段的情况(当aspect在末尾是,aspect也会被部分截断),损失了文本信息。 在处理时,我计算pos_inx的方式是利用aspect左文本长度和aspect本身长度,此时pos_idx[i][1]可能会大于seq_len。 综上,我个人觉得更为合理的解决方式是,将max_len设置为所有文本的最长长度(亦即本仓库中的seq_len),这样不但能解决上述超过界限的问题,还能够不损失文本信息。