Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered #8

Open
PopMeshgrid opened this issue Oct 23, 2019 · 4 comments
Open

RuntimeError: CUDA error: device-side assert triggered #8

PopMeshgrid opened this issue Oct 23, 2019 · 4 comments

Comments

@PopMeshgrid
Copy link

/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "train.py", line 364, in
decoder_input_init,decoder_hidden_init,attention_sum_init,decoder_attention_init)
File "train.py", line 212, in my_train
if int(y[0][i][di]) == 0:
RuntimeError: CUDA error: device-side assert triggered

when i use my dataset,it have above issue,how to solve it?

@PopMeshgrid
Copy link
Author

/opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
Traceback (most recent call last):
File "Train.py", line 360, in
decoder_input_init,decoder_hidden_init,attention_sum_init,decoder_attention_init)
File "Train.py", line 214, in my_train
loss += criterion(decoder_output[i], y[:,i,di])
File "/gpu/zhengtianxiang/soft/Anaconda1/envs/ocr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/gpu/zhengtianxiang/soft/Anaconda1/envs/ocr/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 210, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "/gpu/zhengtianxiang/soft/Anaconda1/envs/ocr/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111\

more details likes this

@Jeremy-lf
Copy link

It's just the Voc_size is not same as the actual size

@ZacHu-ZYH
Copy link

@PopMeshgrid Hi, I met the same problem, have u solved this problem? thx!

@whywhs
Copy link
Owner

whywhs commented Dec 17, 2019

I suggest you check the label's length of your dataset.
In CROHME dataset, there are 110 symbols plus 'eol' and 'sos'. So the label's length of my dataset is 112. You should change the parameter in Train.py(line 275) to make sure it is same with your label's length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants