[Question] Why do Document-Based Scatterplots need category?

Open fredguth opened this issue 6 years ago • 1 comments

Sorry to ask via issue tracker, tried to find the answer in the referred arxived article and did not know of any other better channel.

I am trying to figure out how the Document-Based Scatterplot works.

I get that it uses Tf-Idf on unigrams of the text and takes the 2 first unigrams of the vector (the most different terms?) as axis. But what function is applied to each document to find its x-y position? Its "nearess" to each term?

Besides, I don't understand why we need to provide Category in this case. I understood it uses category to colorize the points, but anything else? Because if it's just that, it seems a hard constraint to Document-Based Scatterplot for something one may not need. But I guess I am missing something.

Mar 27 '19 21:03 fredguth

Related to the previous question, how can I find out which term was used as axis?

Mar 31 '19 11:03 fredguth