Unfair comparison between ProtBert and ESM

Open ww-rm opened this issue 1 year ago • 0 comments

In ProtTrans, the author says that:

No auxiliary tasks like BERT's next-sentence prediction were used for any model described here.

But in the PEER, the [CLS] token is used for ProtBert as a protein-level embedding representation. In this case the [CLS] token may not have the ability to represent sequence embedding.

For ProtBert, should we use the same strategy as for ESM (i.e., mean pooling over all residues) to get a fairer comparison?

Feb 29 '24 11:02 ww-rm