-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
songci数据集,wiki2预训练时会报错,生成的掩码pt文件wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt只有1k #14
Comments
遇到同样的问题,请问老哥你解决了吗? |
已解决可以看我另外一个issuehttps://github.com/moon-hotel/BertWithPretrained/issues/15 |
这个问题得你自己去排除一下,我没有遇到过。 感觉像是你这个PyTorch版本多了一个num_samples参数? |
应该是数据集切分的问题,跑wiki数据集时设置ModelConfig.seps='.' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
注意,正在使用本地MyTransformer中的MyMultiHeadAttention实现
[2022-11-27 15:03:35] - INFO: ## 使用token embedding中的权重矩阵作为输出层的权重!torch.Size([30522, 768])
[2022-11-27 15:03:38] - INFO: 缓存文件 /home/********/博一/my_explore/BERT_learn/BertWithPretrained-main/data/WikiText/wiki_test_mlNone_rs2022_mr15_mtr8_mtur5.pt 不存在,重新处理并缓存!
正在读取原始数据: 100%|██████████████| 4358/4358 [00:00<00:00, 11122.89it/s]
正在构造NSP和MLM样本(test): 100%|██| 1847/1847 [00:00<00:00, 1681180.44it/s]
[2022-11-27 15:03:38] - INFO: 缓存文件 /home/********/博一/my_explore/BERT_learn/BertWithPretrained-main/data/WikiText/wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt 不存在,重新处理并缓存!
正在读取原始数据: 100%|████████████| 36718/36718 [00:03<00:00, 11100.30it/s]
正在构造NSP和MLM样本(train): 100%|█| 15496/15496 [00:00<00:00, 1615704.25it/
Traceback (most recent call last):
File "TaskForPretraining.py", line 300, in
train(config)
File "TaskForPretraining.py", line 105, in train
val_file_path=config.val_file_path)
File "../utils/create_pretraining_data.py", line 334, in load_train_val_test_data
collate_fn=self.generate_batch)
File "/home/pgrad/.conda/envs/wmc_transformer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in init
sampler = RandomSampler(dataset)
File "/home/pgrad/.conda/envs/wmc_transformer/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
The text was updated successfully, but these errors were encountered: