KeyError while trying to create a topicmodel
Hi there,
I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line
MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)
i get the following error
Traceback (most recent call last):
File "
I will appreciate all the help in this regard
Ack, this bug won't die. There were a couple of places where we assumed that metadata records and feature sets were complete for all papers in a corpus, which is often false. This should be an easy fix, hopefully can get a patch out next week.
Thanks for reporting this!
On Jan 11, 2015, at 8:14 PM, khalidkhannz78PK [email protected] wrote:
Hi there,
I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line
MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)
i get the following error
Traceback (most recent call last): File "", line 1, in File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 108, in build self.prep() File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 89, in prep self._generate_corpus(meta) File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/mallet.py", line 152, in _generate_corpus vocab=self.D.features[self.feature]['index'] ) File "//anaconda/lib/python2.7/site-packages/tethne/writers/corpora.py", line 59, in to_documents meta += [ str(metadict[p][f]) for f in metakeys ] KeyError: '10.1525/rac.2006.16.1.95'
I will appreciate all the help in this regard
— Reply to this email directly or view it on GitHub.
Hi Erick,
Any update on rectifying this issue??
Yes, sorry it took so long. The patched version is available as release v0.6.3.3-beta2 , or via PyPI.
If you're using pip, you should be able to just do:
$ pip uninstall tethne
$ pip install tethne --pre
Let me know whether this solves the problem.
Hi Eric,
You may also have noticed the Mallet path error in Window or received a query from some other tethne user.
When I try to build the model using following syntax, I am getting the following error in windows. However the program runs fine in Linux.
model = M.build(Z=50, max_iter=300, prep=True)
OSError Traceback (most recent call last)
C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in build(self, Z, max_iter, prep, **kwargs) 106 if not self.prepped: 107 if prep: --> 108 self.prep() 109 else: 110 raise RuntimeError('Not so fast! Call prep() or set prep=True.')
C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in prep(self, meta) 87 """ 88 ---> 89 self._generate_corpus(meta) 90 self.prepped = True 91
C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _generate_corpus(self, meta) 152 vocab=self.D.features[self.feature]['index'] ) 153 --> 154 self._export_corpus() 155 156 def _export_corpus(self):
C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _export_corpus(self) 171 172 except OSError: # Raised if mallet_path is bad. --> 173 raise OSError("MALLET path invalid or non-existent.") 174 175 if exit != 0:
OSError: MALLET path invalid or non-existent.
I wonder if windows should be give Mallet path in any specific format?
Hi @mubashirqasim,
Can you post your code for initializing the MALLETModelManager? Its constructor accepts a parameter mallet_path, and I'm specifically interested in what you're passing there.
Tethne is almost entirely untested in Windows. Maybe if I get some time/funding I'll start pushing it in that direction, but until then I'm afraid that you'll find plenty of odd things when you run Tethne in Windows.
Hi Eric,
Thanks for the prompt response. Here is the code to call MALLETModelManager.
from tethne.model.managers import MALLETModelManager malletpath = 'c:/mallet' outpath = 'c:/tmp/out' feature = 'unigrams_filtered' MyManager = MALLETModelManager(MyCorpus, feature, outpath, mallet_path=malletpath)
Flagging this for a future Windows-compatible version
This may be fixed in v0.8-beta. If anyone has a chance to test this in Windows, I'd appreciate hearing about it!