C Mo
C Mo
1. python 3.7.10 2. 4,196,900 training points 3. yes, embedding dim is 391 it is probably a pickle issue. the problem does not arise when i se ns_method to kcentroid,...
thanks for taking a look into it. when the `ns_method` in the config is set to `ensemble`, it still gives this error: ``` ~/.local/lib/python3.7/site-packages/xclib-0.97-py3.7-linux-x86_64.egg/xclib/utils/shortlist.py in save(self, fname) 394 395 def...
nnz is set to 100 (i think that's the default), number of labels is 533213, number of training data points is 328641
got it. the label file for training is 261 MB (trn_X_Y.txt)
the size of the label matrix is 64. this is how i got the size: ``` from xclib.data.data_utils import read_sparse_file label_path = "trn_X_Y.txt" label_matrix = read_sparse_file(label_path, safe_read=False) import sys sys.getsizeof(label_matrix)...
thanks for your insight! does that mean that there are different embeddings for every dataset? the way I tried to generate the embedding files was using for example wki.en.vec, reading...
thanks for the clarification. so what I ended up doing was something like: `model = fasttext.train_unsupervised(corpus_file, dim=dim)` and then using `vocab = model.words`, creating a np array of V (`len(vocab)`)...