enstop
enstop copied to clipboard
Ensemble topic modelling with pLSA
When using `model.transform()` on new unseen data, the following error occurs: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 1 test_corpus = df1['cleaned_text'].tolist() 2 test_dtm = vectorizer.transform(test_corpus) ----> 3...
Hi @lmcinnes thanks for this nice code here... I am looking for a solution for the following task: I have a cluster of small texts and want to extract the...
i guess :)
Dear Leland, I tried to use pyLDAvis with enstop, following same API as the sklearn topic models. I did essentially what is shown here https://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/sklearn.ipynb and replaced scikit-learn's LatentDirichletAllocation with...
When I am running the following code: ``` ens_model = EnsembleTopics(n_components=20, n_starts=8, n_jobs=2).fit(data_vec) ``` I get the error: ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in d:\pycharmprojects\biclustering\venv\lib\site-packages\enstop\enstop_.py in fit(self,...
PLSA and other methods gives strange coherence score: ``` PLSA(n_components=3).fit(data_vec).coherence() PLSA(n_components=4).fit(data_vec).coherence() ``` ``` n=5, -894.0931521853117 n=4, -846.5056881515624 n=1000, -548.1772075123278 ``` When I use gensim, I get quite a good score:...
I can't run NMF algorithm. When I run: ``` %%time nmf_model = NMF(n_components=20, beta_loss='kullback-leibler', solver='mu').fit(data) ``` ... I see the following error stack : ``` --------------------------------------------------------------------------- FloatingPointError Traceback (most recent...
The code from your homepage from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer from enstop import EnsembleTopics news = fetch_20newsgroups(subset='all') data = CountVectorizer().fit_transform(news.data) model = EnsembleTopics(n_components=20).fit(data) topics = model.components_ doc_vectors...