GeneZC comments

Results 60 comments of


                                            GeneZC

Why we build tokenizer using both train and test set?

While we have built tokenizer for test set, the corresponding embeddings will be randomly initialized for words that do not exist in the pre-trained word embedding. That is, if a...

MAGN

暂时没有，我们为了保持`train.py`的结构而没有引入文中提到的对损失函数的更多约束。不过我们发现缺失了这一部分后的效果相较于文中提到的实验结果准确率相差在1%左右。

MAGN

应该自己重写一个损失函数，替代原来在`train.py`的损失函数，即criterion

MAGN

感觉只有两个超参比较敏感。batch_size=64 learning_rate=0.001

Issue with infer example

It seems that the dependency matrix has not been processed properly. Could you please give the details of your example?

Would you like to share the parameter settings of each model?

Maybe I will consider releasing the model parameters based on my own experiments later.

Inferencing on Unlabelled data

Hi, this repo concentrates on the situation that the aspect terms are given. If you would like to do it without provided aspect terms, you could refer to end to...

可能导致的长度不一致的一点小问题

@ZhangYikaii 按理说只要seq_len是所有数据的最大长度应该就没问题？

我个人分析应该是这样的：所有文本的最长长度为83（举个例子）而本仓库设置的文本最长长度为80，所以会有原文本被阶段的情况（当aspect在末尾是，aspect也会被部分截断），损失了文本信息。在处理时，我计算pos_inx的方式是利用aspect左文本长度和aspect本身长度，此时pos_idx[i][1]可能会大于seq_len。综上，我个人觉得更为合理的解决方式是，将max_len设置为所有文本的最长长度（亦即本仓库中的seq_len），这样不但能解决上述超过界限的问题，还能够不损失文本信息。