Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault from rake.apply function #49

Open
birgitbartels opened this issue Dec 29, 2023 · 0 comments
Open

Segmentation fault from rake.apply function #49

birgitbartels opened this issue Dec 29, 2023 · 0 comments

Comments

@birgitbartels
Copy link

Hello everyone,

I wanted to use the multi_rake keyword extractor. However, my code continuously shuts down because of a 'segmentation fault', which seems to be linked to the line "keywords = rake.apply(text=text)".

I create a class that uses the rake extractor and then wanted to use that class on a small Dutch text:

from multi_rake import Rake

class RakeKeywordExtractor():

    def __init__(self):
        # These are the default values, but we might want to adapt them!
        self.rake = Rake()

    def get_keywords(self, text, limit=None):
        if limit:
            keywords = self.rake.apply(text=text)
            return keywords[:limit]
        
        else:
            return self.rake.apply(text=text)
        
keyword_extractor = RakeKeywordExtractor()


tekst = """
De oorzaak van aften is niet bekend. We denken dat ze makkelijker ontstaan bij 1 of meer van deze dingen:

kleine wondjes in uw mond, bijvoorbeeld door:
bijten op uw wang
tandenpoetsen of flossen
een kunstgebit dat niet goed past
droge mond
stress
veranderingen in hormonen, bijvoorbeeld door ongesteld zijn of zwanger zijn
erfelijke aanleg: dit betekent dat veel mensen in uw familie aften hebben
heel soms bij te weinig ijzer, vitamine B12, of foliumzuur in uw bloed.
heel soms zijn aften een bijwerking van medicijnen
Bijvoorbeeld van sterke pijnstillers (fentanyl) of medicijnen bij kanker.
Er is geen bewijs dat deze dingen aften veroorzaken.
"""

keywords = keyword_extractor.get_keywords(tekst)
print("These are the keywords:")
for keyword in keywords:
    print(keyword)

I enabled fault handler to get more information about the segmentation fault and then got this :

Fatal Python error: Segmentation fault

Current thread 0x00000001ddd42080 (most recent call first):
  File "...venv/lib/python3.11/site-packages/multi_rake/utils.py", line 14 in detect_language
  File "...venv/lib/python3.11/site-packages/multi_rake/algorithm.py", line 62 in apply
  File "...backend/src/services/keyword_extraction/rake.py", line 18 in get_keywords
  File "...backend/src/services/keyword_extraction/rake.py", line 40 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pvectorc, pycld2._pycld2, regex._regex (total: 16)
[1]    12942 segmentation fault  venv/bin/python -Xfaulthandler 

The error seems to be linked to the detect_language function in multi_rake/utils.py.

Does anybody maybe know what is causing this segmentation error and how I can resolve it?

Thank you!

Kind regards,

Birgit Bartels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant