Comparing a file with itself raises IndexError #15

nitinsurya · 2020-06-10T06:05:28Z

Running command like:

text-matcher tmp.txt tmp.txt

raises IndexError. Sample stack trace:

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    267             # If we've gone through the whole list and there's nothing
    268             # left to extend, then stop. Otherwise do this again.
--> 269             self.extend_matches()
    270
    271         return self.healed_matches

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    267             # If we've gone through the whole list and there's nothing
    268             # left to extend, then stop. Otherwise do this again.
--> 269             self.extend_matches()
    270
    271         return self.healed_matches

.../text_matcher/matcher.py in extend_matches(self, cutoff)
    237         for match in self.healed_matches:
    238             # Look one word before.
--> 239             wordA = self.textAgrams[(match.a - 1)][0]
    240             wordB = self.textBgrams[(match.b - 1)][0]
    241             if self.edit_ratio(wordA, wordB) < cutoff:

IndexError: list index out of range

Error is possibly because of this: https://github.com/JonathanReeve/text-matcher/blob/master/text_matcher/matcher.py#L239 where match.a is 0 and the evaluated command becomes:
wordA = self.textAgrams[-1][0] and thus causing an infinite loop.

The text was updated successfully, but these errors were encountered:

JonathanReeve · 2020-06-10T21:27:42Z

Thanks for catching this, and for looking into it for me.
My best guess for how to fix this is to maybe check to make sure two texts are not identical before starting the matching. And then just to exit, saying the texts are identical, rather than match with them. But this is maybe not the best way to go. Any ideas for this?

nitinsurya · 2020-06-13T00:22:59Z

From my side, I feel this is more of a python code bug fix, because here we are unintentionally going from position 0 to position -1 of a text.

So, instead I would say, in the package, before doing
--> 239 wordA = self.textAgrams[(match.a - 1)][0]

we check if match.a > 0 and match.b > 0 and then continue.

Rest of the code seems to handle the situation well.

…anReeve#15, add a check. seems to work idk

aizdorovets pushed a commit to aizdorovets/text-matcher that referenced this issue Mar 22, 2023

When the texts are the same or close, there is IndexError. Per Jonath…

de99008

…anReeve#15, add a check. seems to work idk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing a file with itself raises IndexError #15

Comparing a file with itself raises IndexError #15

nitinsurya commented Jun 10, 2020

JonathanReeve commented Jun 10, 2020

nitinsurya commented Jun 13, 2020

Comparing a file with itself raises IndexError #15

Comparing a file with itself raises IndexError #15

Comments

nitinsurya commented Jun 10, 2020

JonathanReeve commented Jun 10, 2020

nitinsurya commented Jun 13, 2020