Error when running hm.py #4

darthgera123 · 2021-05-10T20:43:21Z

Hi i wanted to run vilio for my experiment. I made a copy of fts_tsv/hm_data_tsv.py and updated HMTorchDataset class to only read from 1 single file (instead of splits). The pastebin is here. So after passing the data through the model (im using U) im getting this error :

Traceback (most recent call last):
  File "hm_uniter.py", line 392, in <module>
    main()
  File "hm_uniter.py", line 361, in main
    hm.train(hm.train_tuple, hm.valid_tuple)
  File "hm_uniter.py", line 187, in train
    logit = self.model(sent, (feats, boxes))
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/entryU.py", line 200, in forward
    seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda())
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 418, in forward
    encoded_layers = self.encoder(
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 304, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 185, in forward
    intermediate_output = self.intermediate(attention_output)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 158, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/functional.py", line 1369, in gelu
    return torch._C._nn.gelu(input)
RuntimeError: CUDA error: device-side assert triggered

I havent changed any other file and online it says that solution is to fix the numbering in labelling (which I dont think) is the issue. This error comes when in entryU.py im running seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda()).
Also im using uniter and bert-base
Please please help @Muennighoff

The text was updated successfully, but these errors were encountered:

darthgera123 · 2021-05-10T21:38:17Z

Error Trace after I added CUDA_LAUNCH_BLOCKING=1 ahead of python hm.py as the error where I was getting was deterministic.

Muennighoff · 2021-05-22T06:21:40Z

@darthgera123 sorry for the late reply! This is most likely a shape mismatch - What are the dimensions of your images / are you using the HM Dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running hm.py #4

Error when running hm.py #4

darthgera123 commented May 10, 2021 •

edited

darthgera123 commented May 10, 2021

Muennighoff commented May 22, 2021

Error when running hm.py #4

Error when running hm.py #4

Comments

darthgera123 commented May 10, 2021 • edited

darthgera123 commented May 10, 2021

Muennighoff commented May 22, 2021

darthgera123 commented May 10, 2021 •

edited