Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

songci数据集,wiki2预训练时会报错,生成的掩码pt文件wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt只有1k #14

Open
Phil-521 opened this issue Nov 27, 2022 · 4 comments

Comments

@Phil-521
Copy link

注意,正在使用本地MyTransformer中的MyMultiHeadAttention实现

[2022-11-27 15:03:35] - INFO: ## 使用token embedding中的权重矩阵作为输出层的权重!torch.Size([30522, 768])
[2022-11-27 15:03:38] - INFO: 缓存文件 /home/********/博一/my_explore/BERT_learn/BertWithPretrained-main/data/WikiText/wiki_test_mlNone_rs2022_mr15_mtr8_mtur5.pt 不存在,重新处理并缓存!

正在读取原始数据: 100%|██████████████| 4358/4358 [00:00<00:00, 11122.89it/s]

正在构造NSP和MLM样本(test): 100%|██| 1847/1847 [00:00<00:00, 1681180.44it/s]

[2022-11-27 15:03:38] - INFO: 缓存文件 /home/********/博一/my_explore/BERT_learn/BertWithPretrained-main/data/WikiText/wiki_train_mlNone_rs2022_mr15_mtr8_mtur5.pt 不存在,重新处理并缓存!

正在读取原始数据: 100%|████████████| 36718/36718 [00:03<00:00, 11100.30it/s]

正在构造NSP和MLM样本(train): 100%|█| 15496/15496 [00:00<00:00, 1615704.25it/

Traceback (most recent call last):
File "TaskForPretraining.py", line 300, in
train(config)
File "TaskForPretraining.py", line 105, in train
val_file_path=config.val_file_path)
File "../utils/create_pretraining_data.py", line 334, in load_train_val_test_data
collate_fn=self.generate_batch)
File "/home/pgrad/.conda/envs/wmc_transformer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in init
sampler = RandomSampler(dataset)
File "/home/pgrad/.conda/envs/wmc_transformer/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

@haidequanbu
Copy link

遇到同样的问题,请问老哥你解决了吗?

@haidequanbu
Copy link

haidequanbu commented Dec 3, 2022

已解决可以看我另外一个issuehttps://github.com/moon-hotel/BertWithPretrained/issues/15
issuse15,复制到浏览器打开

@moon-hotel
Copy link
Owner

这个问题得你自己去排除一下,我没有遇到过。 感觉像是你这个PyTorch版本多了一个num_samples参数?

@yfzsj01
Copy link

yfzsj01 commented May 19, 2023

应该是数据集切分的问题,跑wiki数据集时设置ModelConfig.seps='.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants