Description
- My train file has 8000 sentences but when i implemented this code it shows number of samples =817
INFO:main:Creating features from dataset file at data/
8000
817
100%|██████████| 817/817 [00:01<00:00, 537.09it/s]
INFO:main:Saving features into cached file data/cached_train_bert-base-multilingual-cased_128_binary
INFO:main:***** Running training *****
INFO:main: Num examples = 817
INFO:main: Num Epochs = 35
INFO:main: Total train batch size = 8
INFO:main: Gradient Accumulation steps = 1
INFO:main: Total optimization steps = 3605
- Similarly my test file has 2000 sentences but when evaluation code was executed it showed num examples =18
INFO:main:Evaluate the following checkpoints: ['outputs/checkpoint-2000', 'outputs']
INFO:main:Creating features from dataset file at data/
2000
18
100%|██████████| 18/18 [00:00<00:00, 148.27it/s]
INFO:main:Saving features into cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:main:***** Running evaluation 2000 *****
INFO:main: Num examples = 18
INFO:main: Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.02it/s]
INFO:main:***** Eval results 2000 *****
INFO:main: fn = 4
INFO:main: fp = 3
INFO:main: mcc = 0.20385887657505022
INFO:main: tn = 7
INFO:main: tp = 4
INFO:main:Loading features from cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:main:***** Running evaluation outputs *****
INFO:main: Num examples = 18
INFO:main: Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.51it/s]
INFO:main:***** Eval results outputs *****
INFO:main: fn = 4
INFO:main: fp = 2
INFO:main: mcc = 0.31622776601683794
INFO:main: tn = 8
INFO:main: tp = 4
- also the final output obtained is
{'fn_2000': 4,
'fn_outputs': 4,
'fp_2000': 3,
'fp_outputs': 2,
'mcc_2000': 0.20385887657505022,
'mcc_outputs': 0.31622776601683794,
'tn_2000': 7,
'tn_outputs': 8,
'tp_2000': 4,
'tp_outputs': 4}
**
- But in total my test_df has 2000 sentences , but wen i add tp+tn+fp+fn i only get 18. Could you please explain this.
**