Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding number of train test samples #33

Open
Sreelakshmi-k opened this issue Nov 13, 2019 · 5 comments
Open

Regarding number of train test samples #33

Sreelakshmi-k opened this issue Nov 13, 2019 · 5 comments

Comments

@Sreelakshmi-k
Copy link

  • My train file has 8000 sentences but when i implemented this code it shows number of samples =817

INFO:main:Creating features from dataset file at data/
8000
817
100%|██████████| 817/817 [00:01<00:00, 537.09it/s]
INFO:main:Saving features into cached file data/cached_train_bert-base-multilingual-cased_128_binary
INFO:main:***** Running training *****
INFO:main: Num examples = 817
INFO:main: Num Epochs = 35
INFO:main: Total train batch size = 8
INFO:main: Gradient Accumulation steps = 1
INFO:main: Total optimization steps = 3605

  • Similarly my test file has 2000 sentences but when evaluation code was executed it showed num examples =18

INFO:main:Evaluate the following checkpoints: ['outputs/checkpoint-2000', 'outputs']
INFO:main:Creating features from dataset file at data/
2000
18
100%|██████████| 18/18 [00:00<00:00, 148.27it/s]
INFO:main:Saving features into cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:main:***** Running evaluation 2000 *****
INFO:main: Num examples = 18
INFO:main: Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.02it/s]
INFO:main:***** Eval results 2000 *****
INFO:main: fn = 4
INFO:main: fp = 3
INFO:main: mcc = 0.20385887657505022
INFO:main: tn = 7
INFO:main: tp = 4

INFO:main:Loading features from cached file data/cached_dev_bert-base-multilingual-cased_128_binary
INFO:main:***** Running evaluation outputs *****
INFO:main: Num examples = 18
INFO:main: Batch size = 8
Evaluating
100% 3/3 [00:00<00:00, 7.51it/s]

INFO:main:***** Eval results outputs *****
INFO:main: fn = 4
INFO:main: fp = 2
INFO:main: mcc = 0.31622776601683794
INFO:main: tn = 8
INFO:main: tp = 4

  • also the final output obtained is

{'fn_2000': 4,
'fn_outputs': 4,
'fp_2000': 3,
'fp_outputs': 2,
'mcc_2000': 0.20385887657505022,
'mcc_outputs': 0.31622776601683794,
'tn_2000': 7,
'tn_outputs': 8,
'tp_2000': 4,
'tp_outputs': 4}

**

  • But in total my test_df has 2000 sentences , but wen i add tp+tn+fp+fn i only get 18. Could you please explain this.

**

@ThilinaRajapakse
Copy link
Owner

The data might be being loaded from the cache dir. Try deleting any cached files.

Do you have the same issue when using the yelp data?

@Sreelakshmi-k
Copy link
Author

Sreelakshmi-k commented Nov 13, 2019 via email

@Sreelakshmi-k
Copy link
Author

Sreelakshmi-k commented Nov 13, 2019 via email

@ThilinaRajapakse
Copy link
Owner

Try using the Yelp dataset as given in the guide. It's impossible to say what the issue is without seeing your data.

Or, consider using Simple Transformers as it is up to date and much easier to use.

@Sreelakshmi-k
Copy link
Author

Sreelakshmi-k commented Nov 13, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants