Potential Bug for vectorizer_model
Hi,
Again, thanks for your amazing work for Bertopic. I used Bertopic for many research projects.
I recently noticed that there may be a tricky bug for vectorizer_model. I checked the code, I found when I pass a cutomised vectorizer_model into BerTopic, the n_gram_range defined in BerTopic class will not be passed to it. Instead, we need to pass both arguments to the vectorizer_model when the vectorizer_model creats.
Thanks for sharing! This is actually not a bug but by design. The underlying idea is that users not familiar with the CountVectorizer can directly use the n_gram_range parameter. However, when using the vectorizer_model it should overwrite n_gram_range since you creating your own custom vectorizer model. Other parameters related to that should have no effect.
In other words, either you use the n_gram_range parameter directly from BERTopic or via vectorizer_model but never both.
Okay, I see, that's good, thanks for your always instant reply, cool design, you may consider adding some notes in doc. Thanks again.
No problem! It's actually already there 😉
https://github.com/MaartenGr/BERTopic/blob/8985f26d4ee89b4c512ff9da22a61371c20605b8/bertopic/_bertopic.py#L155-L159