Issue with gensim 4.0.0+
It appears one of the argument names has changed in the newly released version of GenSim. This has also caused some pain in other libraries using this package for node2vec implementations (e.g., https://github.com/krishnanlab/PecanPy/issues/16)
Traceback (most recent call last):
File "embed_nodevectors.py", line 150, in <module>
main()
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "embed_nodevectors.py", line 137, in main
model.fit(graph)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/nodevectors/node2vec.py", line 130, in fit
self.model = gensim.models.Word2Vec(
TypeError: __init__() got an unexpected keyword argument 'size'
Thanks.
I can make a patch that checks the gensim version for now and routes the argument depending on the version.
Long term the idea would be to remove the gensim dependency entirely. It's a heavy dependency that's a moving target and only used for this one part of Node2Vec.
It has a lot of overhead for Node2Vec. For one, we need to map nodenames back from random walks to a format gensim accepts.
We could just train a word2vec model directly on the nodeIDs (ints, so would be faster) and re-map the embedding dictionary keys from nodeID -> node name only once after everything is trained.
This could be achieved either by stripping the node2vec C code and integrating it in CSRGraphs or by using another C/C++ implementation, like this one:
https://github.com/xgfs/node2vec-c
(which works on CSR representation already, not too far from csrgraphs) or this one:
https://github.com/snap-stanford/snap/tree/master/examples/node2vec
and integrating it into CSRGraphs.
Following bash command worked for me:
pip3 install -I gensim==3.8.0
Following bash command worked for me:
pip3 install -I gensim==3.8.0
That did not solve my problem. It is still there