Some questions about the EMPIRICAL EVALUATION

Open SizhaoXu opened this issue 3 years ago • 0 comments

When we use OWT_hid_400_W_11_LR_0.0002_14.npy to evaluate experiments, we have the following questions:

STATIC WORD EMBEDDINGS EVALUATION. The experimental results on several datasets are much worse than those in the paper, especially on the RW dataset. For the dataset(Word1: [w10, w11], Word2: [w20, w21], Sorce: [s1, s2]), we took the following approach:

hs_10 = model.get_sparse_embedding(w10)['embedding']
hs_20 = model.get_sparse_embedding(w20)['embedding']
p1 = sum(hs_10==hs_20) / hs_10.shape[0]`

hs_11 = model.get_sparse_embedding(w11)['embedding']
hs_21 = model.get_sparse_embedding(w21)['embedding']
p2 = sum(hs_11==hs_21) / hs_11.shape[0]

p = pd.Series([p1, p2])
s = pd.Series(Sorce)
model_sorce = p.corr(s, method='spearman')

CONTEXT-DEPENDENT WORD EMBEDDINGS. How do we get 10 nearest neighbor words in the hash code space? Or, how do you convert the static word embeddings to the context-dependent word embeddings？

Sep 01 '22 06:09 SizhaoXu