How to get similarity score with 2 sentences test #2

briancannon · 2017-11-09T07:01:56Z

The model's output is a torch.cuda.FloatTensor. How can I get real score between 2 sentences?

tuzhucheng · 2017-11-09T18:27:09Z

Check out this line:

MP-CNN-Variants/evaluators/sick_evaluator.py

Line 33 in 002db7a

predictions.append((predict_classes * output.data.exp()).sum(dim=1))

briancannon · 2017-11-21T02:35:12Z

I tried out and got the score.
But when I split test data set to smaller sets (size = 64 pair sentences) and try to evaluate each of them.
I get different results:

INFO - Evaluation metrics for test
INFO - pearson_r spearman_r KL-divergence loss
INFO - test 0.587159 0.65102088053 1.398514747619629

INFO - Evaluation metrics for test
INFO - pearson_r spearman_r KL-divergence loss
INFO - test -0.0634823 -0.0976152631988 1.9832178354263306

INFO - Evaluation metrics for test
INFO - pearson_r spearman_r KL-divergence loss
INFO - test 0.680005 0.517980672901 1.0506935119628906

Why is that? Are the model correct?

tuzhucheng · 2017-11-21T02:51:52Z

You mean evaluating each batch of test set sentences consisting of 64 sentence pairs?

briancannon · 2017-11-21T02:55:39Z

Yes, I did. I just want to evaluate with different and smaller test data sets, not in order or somethings like that.

tuzhucheng · 2017-11-23T04:07:07Z

You showed three different sets of "Evaluation metrics for test". I'm guessing you are wondering why the results differ so much.

Do you mind explaining what you did to get the pearson_r, spearman_r, etc.. for those three sets of data?

briancannon · 2017-11-23T04:26:06Z

You right. That's why I wonder.
Test data set has more 4000 sentence pairs. I try to evaluate with 3 smaller data set, each of them has 64 sentence pairs. Then I got different pearson_r and spearman_r results.

Could you explain to me? Thanks.

tuzhucheng · 2017-11-23T04:47:54Z

How many epochs did you train for?

If the model is not trained very well (high bias in training set) then we can expect to get poor results on the smaller test sets. They vary wildly since there is variation in the different small test sets you created. However, after you train the model properly (low bias in training and dev set), I think you can expect to see better test set metrics and more consistent performance among different test sets. Note for the model to be trained well the hyperparameters also play an extremely important role.

briancannon · 2017-11-23T07:22:13Z

I trained with:
python main.py mpcnn.sick.model --dataset sick --epochs 19 --epsilon 1e-7 --dropout 0
And got (full test data set):
INFO - Evaluation metrics for test
INFO - pearson_r spearman_r KL-divergence loss
INFO - test 0.867389 0.808621796372 0.46649816802241434

You can use split -l 64 a.txt split_a.txt, then randomly select one of them to evaluate and see the result.
I tried this because when printing predictions.append((predict_classes * output.data.exp()).sum(dim=1)) , I find out the similarity score pretty different with the expected result.

tuzhucheng · 2017-12-08T20:14:23Z

Hmm, sorry I missed the notification.

Doing some error analysis is on my TODO list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get similarity score with 2 sentences test #2

How to get similarity score with 2 sentences test #2

briancannon commented Nov 9, 2017

tuzhucheng commented Nov 9, 2017

briancannon commented Nov 21, 2017

tuzhucheng commented Nov 21, 2017

briancannon commented Nov 21, 2017

tuzhucheng commented Nov 23, 2017

briancannon commented Nov 23, 2017

tuzhucheng commented Nov 23, 2017

briancannon commented Nov 23, 2017

tuzhucheng commented Dec 8, 2017

How to get similarity score with 2 sentences test #2

How to get similarity score with 2 sentences test #2

Comments

briancannon commented Nov 9, 2017

tuzhucheng commented Nov 9, 2017

briancannon commented Nov 21, 2017

tuzhucheng commented Nov 21, 2017

briancannon commented Nov 21, 2017

tuzhucheng commented Nov 23, 2017

briancannon commented Nov 23, 2017

tuzhucheng commented Nov 23, 2017

briancannon commented Nov 23, 2017

tuzhucheng commented Dec 8, 2017