tethne icon indicating copy to clipboard operation
tethne copied to clipboard

KeyError while trying to create a topicmodel

Open khalidkhannz78PK opened this issue 11 years ago • 8 comments

Hi there,

I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line

MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)

i get the following error

Traceback (most recent call last): File "", line 1, in File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 108, in build self.prep() File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 89, in prep self._generate_corpus(meta) File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/mallet.py", line 152, in _generate_corpus vocab=self.D.features[self.feature]['index'] ) File "//anaconda/lib/python2.7/site-packages/tethne/writers/corpora.py", line 59, in to_documents meta += [ str(metadict[p][f]) for f in metakeys ] KeyError: '10.1525/rac.2006.16.1.95'

I will appreciate all the help in this regard

khalidkhannz78PK avatar Jan 12 '15 03:01 khalidkhannz78PK

Ack, this bug won't die. There were a couple of places where we assumed that metadata records and feature sets were complete for all papers in a corpus, which is often false. This should be an easy fix, hopefully can get a patch out next week.

Thanks for reporting this!

On Jan 11, 2015, at 8:14 PM, khalidkhannz78PK [email protected] wrote:

Hi there,

I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line

MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)

i get the following error

Traceback (most recent call last): File "", line 1, in File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 108, in build self.prep() File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 89, in prep self._generate_corpus(meta) File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/mallet.py", line 152, in _generate_corpus vocab=self.D.features[self.feature]['index'] ) File "//anaconda/lib/python2.7/site-packages/tethne/writers/corpora.py", line 59, in to_documents meta += [ str(metadict[p][f]) for f in metakeys ] KeyError: '10.1525/rac.2006.16.1.95'

I will appreciate all the help in this regard

— Reply to this email directly or view it on GitHub.

erickpeirson avatar Jan 16 '15 05:01 erickpeirson

Hi Erick,

Any update on rectifying this issue??

khalidkhannz78PK avatar Jan 26 '15 03:01 khalidkhannz78PK

Yes, sorry it took so long. The patched version is available as release v0.6.3.3-beta2 , or via PyPI.

If you're using pip, you should be able to just do:

$ pip uninstall tethne
$ pip install tethne --pre

Let me know whether this solves the problem.

erickpeirson avatar Feb 03 '15 14:02 erickpeirson

Hi Eric,

You may also have noticed the Mallet path error in Window or received a query from some other tethne user.

When I try to build the model using following syntax, I am getting the following error in windows. However the program runs fine in Linux.

model = M.build(Z=50, max_iter=300, prep=True)

OSError Traceback (most recent call last) in () ----> 1 model = M.build(Z=50, max_iter=300, prep=True)

C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in build(self, Z, max_iter, prep, **kwargs) 106 if not self.prepped: 107 if prep: --> 108 self.prep() 109 else: 110 raise RuntimeError('Not so fast! Call prep() or set prep=True.')

C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in prep(self, meta) 87 """ 88 ---> 89 self._generate_corpus(meta) 90 self.prepped = True 91

C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _generate_corpus(self, meta) 152 vocab=self.D.features[self.feature]['index'] ) 153 --> 154 self._export_corpus() 155 156 def _export_corpus(self):

C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _export_corpus(self) 171 172 except OSError: # Raised if mallet_path is bad. --> 173 raise OSError("MALLET path invalid or non-existent.") 174 175 if exit != 0:

OSError: MALLET path invalid or non-existent.

I wonder if windows should be give Mallet path in any specific format?

mubashirqasim avatar Feb 16 '15 10:02 mubashirqasim

Hi @mubashirqasim,

Can you post your code for initializing the MALLETModelManager? Its constructor accepts a parameter mallet_path, and I'm specifically interested in what you're passing there.

Tethne is almost entirely untested in Windows. Maybe if I get some time/funding I'll start pushing it in that direction, but until then I'm afraid that you'll find plenty of odd things when you run Tethne in Windows.

erickpeirson avatar Feb 16 '15 14:02 erickpeirson

Hi Eric,

Thanks for the prompt response. Here is the code to call MALLETModelManager.

from tethne.model.managers import MALLETModelManager malletpath = 'c:/mallet' outpath = 'c:/tmp/out' feature = 'unigrams_filtered' MyManager = MALLETModelManager(MyCorpus, feature, outpath, mallet_path=malletpath)

mubashirqasim avatar Feb 16 '15 19:02 mubashirqasim

Flagging this for a future Windows-compatible version

erickpeirson avatar May 26 '15 22:05 erickpeirson

This may be fixed in v0.8-beta. If anyone has a chance to test this in Windows, I'd appreciate hearing about it!

erickpeirson avatar Jul 11 '16 15:07 erickpeirson