Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing a file with itself raises IndexError #15

Open
nitinsurya opened this issue Jun 10, 2020 · 2 comments
Open

Comparing a file with itself raises IndexError #15

nitinsurya opened this issue Jun 10, 2020 · 2 comments

Comments

@nitinsurya
Copy link

Running command like:

text-matcher tmp.txt tmp.txt

raises IndexError. Sample stack trace:

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    267             # If we've gone through the whole list and there's nothing
    268             # left to extend, then stop. Otherwise do this again.
--> 269             self.extend_matches()
    270
    271         return self.healed_matches

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    267             # If we've gone through the whole list and there's nothing
    268             # left to extend, then stop. Otherwise do this again.
--> 269             self.extend_matches()
    270
    271         return self.healed_matches

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    237         for match in self.healed_matches:
    238             # Look one word before.
--> 239             wordA = self.textAgrams[(match.a - 1)][0]
    240             wordB = self.textBgrams[(match.b - 1)][0]
    241             if self.edit_ratio(wordA, wordB) < cutoff:

IndexError: list index out of range

Error is possibly because of this: https://github.com/JonathanReeve/text-matcher/blob/master/text_matcher/matcher.py#L239 where match.a is 0 and the evaluated command becomes:
wordA = self.textAgrams[-1][0] and thus causing an infinite loop.

@JonathanReeve
Copy link
Owner

Thanks for catching this, and for looking into it for me.
My best guess for how to fix this is to maybe check to make sure two texts are not identical before starting the matching. And then just to exit, saying the texts are identical, rather than match with them. But this is maybe not the best way to go. Any ideas for this?

@nitinsurya
Copy link
Author

From my side, I feel this is more of a python code bug fix, because here we are unintentionally going from position 0 to position -1 of a text.

So, instead I would say, in the package, before doing
--> 239 wordA = self.textAgrams[(match.a - 1)][0]

we check if match.a > 0 and match.b > 0 and then continue.

Rest of the code seems to handle the situation well.

aizdorovets pushed a commit to aizdorovets/text-matcher that referenced this issue Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants