Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Example uses non-existent PMI argument #122

Open
polm opened this issue Dec 7, 2022 · 2 comments
Open

Simple Example uses non-existent PMI argument #122

polm opened this issue Dec 7, 2022 · 2 comments

Comments

@polm
Copy link

polm commented Dec 7, 2022

Thanks for working on this package. I updating the entry in the spaCy Universe (explosion/spaCy#11937 (review)) and we noticed the sample here uses an argument that doesn't seem to work with the latest release.

pmi_filter_thresold=4,

@JasonKessler
Copy link
Owner

JasonKessler commented Dec 8, 2022

Thanks for pointing this out and including Scattertext in the spaCy universe. I'm preparing to deprecate the produce_scattertext_html function, and I think it would be best if the spaCy Universe page included an example of Scattertext usage which involved more of the features available and renders a more interactive UI. For example:

import scattertext as st
import spacy

nlp = spacy.blank('en')
nlp.add_pipe('sentencizer')

df = st.SampleCorpora.ConventionData2012.get_data().assign(
    parse=lambda df: df.text.apply(nlp)
)

corpus = st.CorpusFromParsedDocuments(
    df, 
    category_col='party', 
    parsed_col='parse'
).build().get_stoplisted_unigram_corpus().compact(st.AssociationCompactor(2000))

html = st.produce_scattertext_explorer(
    corpus,
    category='democrat', 
    category_name='Democratic', 
    not_category_name='Republican',
    minimum_term_frequency=0, 
    pmi_threshold_coefficient=0,
    width_in_pixels=1000, 
    metadata=lambda corpus: corpus.get_df()['speaker'],
    transform=st.Scalers.dense_rank
)
with open('./demo_compact.html', 'w') as of:
    of.write(html)

Regardless, I'll update the package to ensure the pmi_filter_thresold argument still works.

@polm
Copy link
Author

polm commented Dec 8, 2022

Ah, thanks for the info about the example! We've already merged the PR I linked to, but if you'd like to update the Universe entry we'd be happy to look at a PR any time. (That said, we're currently working on our website backend, so any updates in the immediate future won't go live for a bit.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants