Regarding number of train test samples

- **My train file has 8000 sentences but when i implemented this code it shows number of samples =817**

INFO:__main__:Creating features from dataset file at data/
8000
817
100%|██████████| 817/817 [00:01<00:00, 537.09it/s]
INFO:__main__:Saving features into cached file data/cached_train_bert-base-multilingual-cased_128_binary
INFO:__main__:***** Running training *****
INFO:__main__:  Num examples = 817
INFO:__main__:  Num Epochs = 35
INFO:__main__:  Total train batch size  = 8
INFO:__main__:  Gradient Accumulation steps = 1
INFO:__main__:  Total optimization steps = 3605

- **Similarly my test file has 2000 sentences but when evaluation code was executed it showed num examples =18**

INFO:__main__:Evaluate the following checkpoints: ['outputs/checkpoint-2000', 'outputs']
INFO:__main__:Creating features from dataset file at data/
2000
18
100%|██████████| 18/18 [00:00<00:00, 148.27it/s]
INFO:__main__:Saving features into cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:__main__:***** Running evaluation 2000 *****
INFO:__main__:  Num examples = 18
INFO:__main__:  Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.02it/s]
INFO:__main__:***** Eval results 2000 *****
INFO:__main__:  fn = 4
INFO:__main__:  fp = 3
INFO:__main__:  mcc = 0.20385887657505022
INFO:__main__:  tn = 7
INFO:__main__:  tp = 4

INFO:__main__:Loading features from cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:__main__:***** Running evaluation outputs *****
INFO:__main__:  Num examples = 18
INFO:__main__:  Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.51it/s]

INFO:__main__:***** Eval results outputs *****
INFO:__main__:  fn = 4
INFO:__main__:  fp = 2
INFO:__main__:  mcc = 0.31622776601683794
INFO:__main__:  tn = 8
INFO:__main__:  tp = 4


- **also the final output obtained is**

 {'fn_2000': 4,
 'fn_outputs': 4,
 'fp_2000': 3,
 'fp_outputs': 2,
 'mcc_2000': 0.20385887657505022,
 'mcc_outputs': 0.31622776601683794,
 'tn_2000': 7,
 'tn_outputs': 8,
 'tp_2000': 4,
 'tp_outputs': 4}


**

- **But in total my test_df has 2000 sentences , but wen i add tp+tn+fp+fn i only get 18. Could you please explain this.**

**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding number of train test samples #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Regarding number of train test samples #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions