Sentic-GCN Don't we need to reconcile SpaCy and BERT tokens?

First of all, thank you for releasing the program on your paper. What I'm curious about is that SpaCy divides sentences into word units, but BERT divides them into WordPiece units, so I think there will be a problem that the tokens are not accurately mapped to each other. I wonder which part of the program you uploaded deals with these problems.

Oct 01 '22 06:10 hjpark2017

First of all, thank you for releasing the program on your paper. What I'm curious about is that SpaCy divides sentences into word units, but BERT divides them into WordPiece units, so I think there will be a problem that the tokens are not accurately mapped to each other. I wonder which part of the program you uploaded deals with these problems.

Hi, Thanks for your question. I do agree that SpaCy divides sentences into word units, but BERT divides them into WordPiece units. That is, the tokens of a small number of samples are not incongruent in SenticGCN-BERT. For the datasets of this work, however, most samples are consistent. Therefore, we do not deal with this problem in our work. Definitely, you can also align the WordPiece units of BERT model for better results. Please let me know if there is any problem. Thanks!!!

Oct 06 '22 13:10 BinLiang-NLP

I'm sorry for the late greeting. Thank you for your kind explanation!

May 30 '23 04:05 hjpark2017