scattertext icon indicating copy to clipboard operation
scattertext copied to clipboard

Simple Example uses non-existent PMI argument

Open polm opened this issue 3 years ago • 2 comments

Thanks for working on this package. I updating the entry in the spaCy Universe (https://github.com/explosion/spaCy/pull/11937#pullrequestreview-1208010525) and we noticed the sample here uses an argument that doesn't seem to work with the latest release.

https://github.com/JasonKessler/scattertext/blob/8ddff82f670aa2ed40312b2cdd077e7f0a98a873/simple.py#L19

polm avatar Dec 07 '22 09:12 polm

Thanks for pointing this out and including Scattertext in the spaCy universe. I'm preparing to deprecate the produce_scattertext_html function, and I think it would be best if the spaCy Universe page included an example of Scattertext usage which involved more of the features available and renders a more interactive UI. For example:

import scattertext as st
import spacy

nlp = spacy.blank('en')
nlp.add_pipe('sentencizer')

df = st.SampleCorpora.ConventionData2012.get_data().assign(
    parse=lambda df: df.text.apply(nlp)
)

corpus = st.CorpusFromParsedDocuments(
    df, 
    category_col='party', 
    parsed_col='parse'
).build().get_stoplisted_unigram_corpus().compact(st.AssociationCompactor(2000))

html = st.produce_scattertext_explorer(
    corpus,
    category='democrat', 
    category_name='Democratic', 
    not_category_name='Republican',
    minimum_term_frequency=0, 
    pmi_threshold_coefficient=0,
    width_in_pixels=1000, 
    metadata=lambda corpus: corpus.get_df()['speaker'],
    transform=st.Scalers.dense_rank
)
with open('./demo_compact.html', 'w') as of:
    of.write(html)

Regardless, I'll update the package to ensure the pmi_filter_thresold argument still works.

JasonKessler avatar Dec 08 '22 07:12 JasonKessler

Ah, thanks for the info about the example! We've already merged the PR I linked to, but if you'd like to update the Universe entry we'd be happy to look at a PR any time. (That said, we're currently working on our website backend, so any updates in the immediate future won't go live for a bit.)

polm avatar Dec 08 '22 09:12 polm