BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

bertopic version 0.16.0 - when adding representation model together with zeroshot_topic_list end with failure

Open amitca71 opened this issue 1 year ago • 3 comments

from bertopic import BERTopic

2024-05-02 10:26:56,345 - BERTopic - Zeroshot Step 2 - Completed ✓ 2024-05-02 10:26:56,346 - BERTopic - Zeroshot Step 3 - Combining clustered topics with the zeroshot model KeyError: '-1' File , line 18 1 from bertopic import BERTopic 3 topic_model = BERTopic( 4 5 # Pipeline models (...) 15 verbose=True 16 ) ---> 18 topics, probs = topic_model.fit_transform(docs, embeddings) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-c320d35e-2ba0-4086-9066-6452698cd8ba/lib/python3.11/site-packages/bertopic/_bertopic.py:3150, in BERTopic.merge_models(cls, models, min_similarity, embedding_model) 3147 merged_topics["topic_labels"][str(new_topic_val)] = selected_topics["topic_labels"][str(new_topic)] 3149 if selected_topics["topic_aspects"]: -> 3150 merged_topics["topic_aspects"][str(new_topic_val)] = selected_topics["topic_aspects"][str(new_topic)] 3152 # Add new embeddings 3153 new_tensors = tensors[new_topic - selected_topics["_outliers"]]

topic_model = BERTopic(

Pipeline models

embedding_model=embedding_model, umap_model=umap_model, hdbscan_model=hdbscan_model, vectorizer_model=vectorizer_model, zeroshot_topic_list=zero_shot_topics_list, zeroshot_min_similarity=.8, representation_model=representation_model,

Hyperparameters

top_n_words=10, verbose=True )

topics, probs = topic_model.fit_transform(docs, embeddings)

amitca71 avatar May 02 '24 10:05 amitca71

This was indeed an issue with 0.16.0 but might be fixed with 0.16.1 but I'm not sure if it will work. There's currently a PR open for 0.16.1 that fixes another issue.

MaartenGr avatar May 02 '24 14:05 MaartenGr

The reason i work with 0.16.0 is because zero shot is failing on 0.16.1. i saw there are opened cases for that already

amitca71 avatar May 02 '24 21:05 amitca71

Have you tried 0.16.1 with the PR I mentioned above? I think that should solve your issue.

MaartenGr avatar May 03 '24 07:05 MaartenGr