lit icon indicating copy to clipboard operation
lit copied to clipboard

Multi label support

Open khaledJabr opened this issue 5 years ago • 6 comments

Thank you for your great work. Im very excited to incorporate it in my projects. One things I am trying to understand more is how far does the library go with multi-label classification problems? and/or if there are ny plans to incorporate it in the future.

khaledJabr avatar Aug 31 '20 22:08 khaledJabr

We don't currently have a multilabel type, but it's on the roadmap. What sort of problem do you have in mind?

In the mean time, you might be able to just add a MultiLabel type to types.py and modify the existing multiclass visualization such that it doesn't assume probabilities sum to 1.

iftenney avatar Aug 31 '20 22:08 iftenney

It's a a topic model framed as a multi-label classification type of problem, so you can think of labels as topics, and a data point can belong to multiple topics (however, not many, 3-4 at max). Thank you for your suggestion, I will try that

khaledJabr avatar Aug 31 '20 23:08 khaledJabr

Hmm - if the label space is very sparse, you could also try one of the following:

  • Use GeneratedText to return something like " ".join(labels) for each example. There's a token-level diff in the generated text output, which would show you matched or mismatched labels. (And you could define your own metrics and add here: https://github.com/PAIR-code/lit/blob/main/lit_nlp/app.py#L313)
  • Use TokenTopKPreds to return a list of (label, score) tuples. Normally this is a List[List[Tuple[str, float]]] which gives language model predictions (= labels from a large set), but you could have the outer list just be length 1 and it'll display on the first token. You could also work off of this visualization to make it a little better suited to whole-sentence labels; see here for the code: https://github.com/PAIR-code/lit/blob/main/lit_nlp/client/modules/lm_prediction_module.ts

Hope this is helpful!

iftenney avatar Aug 31 '20 23:08 iftenney

@iftenney this is very helpful. Thank you

khaledJabr avatar Sep 01 '20 00:09 khaledJabr

Hi,I would like to do for sentiment analysis use case with 3 or more classes..for example using sentiment analysis on twitter dataset using glove or word2vec embeddings.Is this possible with the present capability of LIT

dsvrsec avatar Sep 07 '20 14:09 dsvrsec

@kumarvrsec does each tweet have a single class, or are multiple labels applicable per example? If just a single class per example, then its within the current capabilities. If multi-label per example, then the same advice/extensions mentioned by @iftenney above would be necessary.

jameswex avatar Sep 08 '20 11:09 jameswex