nodevectors icon indicating copy to clipboard operation
nodevectors copied to clipboard

Issue with gensim 4.0.0+

Open cthoyt opened this issue 4 years ago • 3 comments

It appears one of the argument names has changed in the newly released version of GenSim. This has also caused some pain in other libraries using this package for node2vec implementations (e.g., https://github.com/krishnanlab/PecanPy/issues/16)

Traceback (most recent call last):
  File "embed_nodevectors.py", line 150, in <module>
    main()
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "embed_nodevectors.py", line 137, in main
    model.fit(graph)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/nodevectors/node2vec.py", line 130, in fit
    self.model = gensim.models.Word2Vec(
TypeError: __init__() got an unexpected keyword argument 'size'

cthoyt avatar Apr 27 '21 11:04 cthoyt

Thanks.

I can make a patch that checks the gensim version for now and routes the argument depending on the version.

Long term the idea would be to remove the gensim dependency entirely. It's a heavy dependency that's a moving target and only used for this one part of Node2Vec.

It has a lot of overhead for Node2Vec. For one, we need to map nodenames back from random walks to a format gensim accepts.

We could just train a word2vec model directly on the nodeIDs (ints, so would be faster) and re-map the embedding dictionary keys from nodeID -> node name only once after everything is trained.

This could be achieved either by stripping the node2vec C code and integrating it in CSRGraphs or by using another C/C++ implementation, like this one:

https://github.com/xgfs/node2vec-c

(which works on CSR representation already, not too far from csrgraphs) or this one:

https://github.com/snap-stanford/snap/tree/master/examples/node2vec

and integrating it into CSRGraphs.

VHRanger avatar Apr 27 '21 16:04 VHRanger

Following bash command worked for me:

pip3 install -I gensim==3.8.0

hhu1 avatar Jul 07 '21 16:07 hhu1

Following bash command worked for me:

pip3 install -I gensim==3.8.0

That did not solve my problem. It is still there

Wapiti08 avatar Feb 21 '22 14:02 Wapiti08