-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text 158 #112
base: master
Are you sure you want to change the base?
Text 158 #112
Conversation
This also includes the fix for TEXT-86 #106. Can you remove that so the only files changed are for the |
@@ -37,7 +37,7 @@ public static void setUp() { | |||
@Test | |||
public void testGettingJaccardDistance() { | |||
// Expected Jaccard distance = 1.0 - (intersect / union) | |||
assertEquals(1.0, classBeingTested.apply("", "")); | |||
assertEquals(0.0, classBeingTested.apply("", "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works because the unit test passes in the same string on both sides, i.e. "" vs ""
.
Do not use == to measure equality. This measures the same object reference. Try this in your test:
// Change
assertEquals(0.0, classBeingTested.apply("", ""));
// to
assertEquals(0.0, classBeingTested.apply("", new StringBuilder()));
You should fix the code to use StringUtils.equals(CharSequence, CharSequence)
to test for equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, but
- Operator == is used to measure two int values (length).
- with
new StringBuilder()
also works well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you are right. It was my mistake carried over from fixing another similarity class to use StringUtils.equals
.
However you should consider using StringUtils.equals
before splitting all the sequences into a set to provide a fast exit in the case of equal input. This is done in JaroWinklerSimilarity
.
Hi @aherbert it looks like Travis is failing in jdk 12 for a reason that is not related to this change:
can you please have a look and advise how to move forward? |
I do not know why This has just been fixed on master by @garydgregory to use The current travis results pass JDK 8 and 11 so this is not an issue. |
@@ -37,7 +37,7 @@ public static void setUp() { | |||
@Test | |||
public void testGettingJaccardDistance() { | |||
// Expected Jaccard distance = 1.0 - (intersect / union) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this comment is out of sync with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, can you describe why this comment looks wrong? For me it's good, shows the formula, as well as in JaccardSimilarityTest.java
// Expected Jaccard similarity = (intersect / union)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the comment reads:
// Expected Jaccard distance = 1.0 - (intersect / union)
and the next line is in contradiction with this comment, expecting 0.0, not 1.0.
assertEquals(0.0, classBeingTested.apply("", ""));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @garydgregory , I'm still not convinced.
Intersection of two empty sets is empty set, union is also empty set. They are equal. So Jaccard similarity is 1. And therefore distance is 0.
Fix Jaccard similarity with empty strings:
Due to bug, Jaccard similarity for two empty strings returned 0.0, and the distance 1.0, now it returns similarity 1.0, and the distance 0.0