Are you using the test dataset in the training process? #37

kunwuz · 2019-11-30T05:06:13Z

    # print the test evaluation metrics each 10 epochs; pos:neg = 1:10.
    if (epoch + 1) % 10 != 0:
        if args.verbose > 0 and epoch % args.verbose == 0:
            perf_str = 'Epoch %d [%.1fs]: train==[%.5f=%.5f + %.5f]' % (
                epoch, time() - t1, loss, mf_loss, reg_loss)
            print(perf_str)
        continue

    t2 = time()
    users_to_test = list(data_generator.test_set.keys())
    ret = test(sess, model, users_to_test, drop_flag=True)

    t3 = time()

    loss_loger.append(loss)
    rec_loger.append(ret['recall'])
    pre_loger.append(ret['precision'])
    ndcg_loger.append(ret['ndcg'])
    hit_loger.append(ret['hit_ratio'])

    if args.verbose > 0:
        perf_str = 'Epoch %d [%.1fs + %.1fs]: train==[%.5f=%.5f + %.5f + %.5f], recall=[%.5f, %.5f], ' \
                   'precision=[%.5f, %.5f], hit=[%.5f, %.5f], ndcg=[%.5f, %.5f]' % \
                   (epoch, t2 - t1, t3 - t2, loss, mf_loss, emb_loss, reg_loss, ret['recall'][0], ret['recall'][-1],
                    ret['precision'][0], ret['precision'][-1], ret['hit_ratio'][0], ret['hit_ratio'][-1],
                    ret['ndcg'][0], ret['ndcg'][-1])
        print(perf_str)

    cur_best_pre_0, stopping_step, should_stop = early_stopping(ret['recall'][0], cur_best_pre_0,
                                                                stopping_step, expected_order='acc', flag_step=5)

    # *********************************************************
    # early stopping when cur_best_pre_0 is decreasing for ten successive steps.
    if should_stop == True:
        break

    # *********************************************************

Could you explain why you are using "users_to_test" in the early stopping ? I'm really puzzled by the code pasted above.

The text was updated successfully, but these errors were encountered:

ghost · 2020-06-30T03:51:31Z

I think the author just used test data to determine whether the training was over in each epoch. If you don't want to do this, you can put the test code in a test function and call it again after the train finishes.

kunwuz · 2020-06-30T12:37:18Z

I think the author just used test data to determine whether the training was over in each epoch. If you don't want to do this, you can put the test code in a test function and call it again after the train finishes.

Yes that's true. But it is not correct in general ML, as the only criterion for stopping the training is the performance on the validation set.

ghost · 2020-07-13T06:13:21Z

I think it's better to call this a validation set than testing data.

kunwuz · 2020-07-13T06:19:03Z

I think it's better to call this a validation set than testing data.

Then where is the evaluation script if that's just the validation set? Based on the split of the dataset in the code, there is no validation set if I understand it correctly. Feel free to point me to the code where I might miss something critical.

tohtsky mentioned this issue Aug 20, 2020

On the positive definiteness of Gram matrix in EASE_R recommender MaurizioFD/RecSys2019_DeepLearning_Evaluation#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you using the test dataset in the training process? #37

Are you using the test dataset in the training process? #37

kunwuz commented Nov 30, 2019

ghost commented Jun 30, 2020

kunwuz commented Jun 30, 2020

ghost commented Jul 13, 2020

kunwuz commented Jul 13, 2020

Are you using the test dataset in the training process? #37

Are you using the test dataset in the training process? #37

Comments

kunwuz commented Nov 30, 2019

ghost commented Jun 30, 2020

kunwuz commented Jun 30, 2020

ghost commented Jul 13, 2020

kunwuz commented Jul 13, 2020