C Mo comments

Results 7 comments of


                                            C Mo

training shortlist on large-ish dataset

1. python 3.7.10 2. 4,196,900 training points 3. yes, embedding dim is 391 it is probably a pickle issue. the problem does not arise when i se ns_method to kcentroid,...

training shortlist on large-ish dataset

thanks for taking a look into it. when the `ns_method` in the config is set to `ensemble`, it still gives this error: ``` ~/.local/lib/python3.7/site-packages/xclib-0.97-py3.7-linux-x86_64.egg/xclib/utils/shortlist.py in save(self, fname) 394 395 def...

training shortlist on large-ish dataset

nnz is set to 100 (i think that's the default), number of labels is 533213, number of training data points is 328641

training shortlist on large-ish dataset

got it. the label file for training is 261 MB (trn_X_Y.txt)

training shortlist on large-ish dataset

the size of the label matrix is 64. this is how i got the size: ``` from xclib.data.data_utils import read_sparse_file label_path = "trn_X_Y.txt" label_matrix = read_sparse_file(label_path, safe_read=False) import sys sys.getsizeof(label_matrix)...

fasttextB_embeddings_300d.npy file

thanks for your insight! does that mean that there are different embeddings for every dataset? the way I tried to generate the embedding files was using for example wki.en.vec, reading...

fasttextB_embeddings_300d.npy file

thanks for the clarification. so what I ended up doing was something like: `model = fasttext.train_unsupervised(corpus_file, dim=dim)` and then using `vocab = model.words`, creating a np array of V (`len(vocab)`)...