Word cloud is broken
Describe the bug The word cloud shows differently sized words, even though every word is unique and only occurs once.
Python
df = pd.DataFrame({'name': [str(i) for i in range(10)]})
rep = plot(df, 'name')
Expected behavior Words with the same frequency should have the same size.
Screenshots

Desktop
- OS: Windows 10
- Platform: Windows Powershell
- Platform Version [e.g. 1.0]
- Python Version Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
- Dataprep Version: dataprep-0.2.11-py3-none-any.whl
Hi @vitamins. Thanks for creating this issue! We are using the wordcloud library to create the word cloud, and apparently this phenomenon is a part of their algorithm https://github.com/amueller/word_cloud/issues/285. The solution they provide is to specify the max_font_size which does work:
however, it is not easy to determine the optimal max_font_size for an arbitrary word cloud. The max_font_size used for the above plot will not work for longer strings:
I think the optimal max_font_size is a function of the number of words and also the word lengths, which we could try to determine by trial and error. This could be particularly difficult since some characters are longer than others. However, we also show the word frequency bar chart which can give a more accurate reading of the frequency of each word.