BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

There are Chinese characters in my project, but after calling the visualize_document_datamap() method, the characters appear as garbled text.

Open superseanyoung opened this issue 1 year ago • 4 comments

Have you searched existing issues? 🔎

  • [X] I have searched and found no existing issues

Desribe the bug

fig = topic_model.visualize_document_datamap( sentences, topics=topics, reduced_embeddings=reduced_embeddings, #custom_labels=custom_labels, title='文档和主题的分布', sub_title='基于 BERTopic 的主题建模', width=1200, height=1200 ) Even after setting plt.rcParams['font.sans-serif'] = ['SimHei'], I still can't see the characters.

Reproduction

from bertopic import BERTopic
# with the reduced embeddings
reduced_embeddings = UMAP(n_neighbors=15, n_components=2, min_dist=0.0, metric='cosine').fit_transform(embeddings)
fig = topic_model.visualize_document_datamap(
    sentences,
    topics=topics,
    reduced_embeddings=reduced_embeddings,
    #custom_labels=custom_labels,
    title='文档和主题的分布',
    sub_title='基于 BERTopic 的主题建模',
    width=1200,
    height=1200
)

BERTopic Version

0.16.4

superseanyoung avatar Nov 12 '24 08:11 superseanyoung

Hmmm, I'm not entirely sure what is needed here. Have you tried posting an issue on the DataMapPlot repository? I think there isn't much to do from my end since I'm just calling that package and passing the data.

MaartenGr avatar Nov 12 '24 09:11 MaartenGr

Can the "visualize_document_datamap()" method set font display parameters?

superseanyoung avatar Nov 13 '24 05:11 superseanyoung

@superseanyoung You can check all parameters implemented here or here

MaartenGr avatar Nov 13 '24 14:11 MaartenGr

For future people, please see my reply here: https://github.com/TutteInstitute/datamapplot/issues/50

NullPxl avatar Dec 06 '24 21:12 NullPxl