Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torch.nn.functional' has no attribute 'one_hot' #8

Open
wongdarrell opened this issue Sep 15, 2019 · 5 comments

Comments

@wongdarrell
Copy link

Hi, I downloaded and ran your program, and got a training error as above. I have no GPU, so I changed the setup to fp16 = 'false' (xlnet left as your demo choice).

What's the problem?

DarrellWong
code:
if args['do_train']:
train_dataset = load_and_cache_examples(task, tokenizer)
global_step, tr_loss = train(train_dataset, model, tokenizer)
logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)
------------------------------------------------------------ output window----
INFO:main:Creating features from dataset file at data/
100%|████████████████████████████████| 560000/560000 [05:07<00:00, 1823.33it/s]
INFO:main:Saving features into cached file data/cached_train_xlnet-base-cased_128_binary
INFO:main:***** Running training *****
INFO:main: Num examples = 560000
INFO:main: Num Epochs = 1
INFO:main: Total train batch size = 8
INFO:main: Gradient Accumulation steps = 1
INFO:main: Total optimization steps = 70000
Epoch: 0%| | 0/1 [00:00<?, ?it/s]

HBox(children=(IntProgress(value=0, description='Iteration', max=70000, style=ProgressStyle(description_width=…
-----------------and then error messages --------------------

AttributeError Traceback (most recent call last)
in
1 if args['do_train']:
2 train_dataset = load_and_cache_examples(task, tokenizer)
----> 3 global_step, tr_loss = train(train_dataset, model, tokenizer)
4 logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

in train(train_dataset, model, tokenizer)
43 'token_type_ids': batch[2] if args['model_type'] in ['bert', 'xlnet'] else None, # XLM don't use segment_ids
44 'labels': batch[3]}
---> 45 outputs = model(**inputs)
46 loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc)
47 print("\r%f" % loss, end='')

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\pytorch_transformers\modeling_xlnet.py in forward(self, input_ids, token_type_ids, input_mask, attention_mask, mems, perm_mask, target_mapping, labels, head_mask)
1120 input_mask=input_mask, attention_mask=attention_mask,
1121 mems=mems, perm_mask=perm_mask, target_mapping=target_mapping,
-> 1122 head_mask=head_mask)
1123 output = transformer_outputs[0]
1124

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\pytorch_transformers\modeling_xlnet.py in forward(self, input_ids, token_type_ids, input_mask, attention_mask, mems, perm_mask, target_mapping, head_mask)
920 # 1 indicates not in the same segment [qlen x klen x bsz]
921 seg_mat = (token_type_ids[:, None] != cat_ids[None, :]).long()
--> 922 seg_mat = F.one_hot(seg_mat, num_classes=2).to(dtype_float)
923 else:
924 seg_mat = None

AttributeError: module 'torch.nn.functional' has no attribute 'one_hot'

@ThilinaRajapakse
Copy link
Owner

It looks like your Pytorch is out of date. Can you update it and try again?

@wongdarrell
Copy link
Author

Hi Thilina,
Your suggestion worked. However, its now been about 32h of processing, and it is up to :
INFO:main:Saving model checkpoint to outputs/checkpoint-8000
with no sign of stopping. What is the last checkpoint count in your default run (1 epoch).
Also, what's the setting if I want to freeze all weights except the last layer?
Thanks

@ThilinaRajapakse
Copy link
Owner

Unfortunately, with no GPU your training speed will be slow. I can't remember the total number of steps, but it should be there in the output right before training starts. Checkpoint-8000 means that 8000 steps have been completed. There should also be a tqdm progress bar with approximate time remaining to completion.

I can't remember that off the top of my head but I can get back to you in a few hours on freezing layers. But, usually fine-tuning transformer models is done without freezing any of the layers.

I think it would be best if you used Google Colab with GPU rather than running it locally if a GPU is not available.

@wongdarrell
Copy link
Author

If you are correct, it has only achieved 8000/70000 that you had embedded as t_total = 70000.
Regrading fine-tuning, I thought that several model examples using transformer learning freeze all weight layers but the last layer, which is usually 'new-problem' specific.
It seems that I will need to switch to colab then.

@ThilinaRajapakse
Copy link
Owner

For most transfer learning tasks, you would usually freeze the earlier layers. But in the case of BERT and other derivatives, the approach is to fine-tune all parameters, albeit for only a few epochs. This was the same approach used in the BERT paper.

For each task, we simply plug in the task specific inputs and outputs into BERT and finetune all the parameters end-to-end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants