Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

got segfault from lenet with stn example #9050

Closed
iblislin opened this issue Dec 13, 2017 · 2 comments
Closed

got segfault from lenet with stn example #9050

iblislin opened this issue Dec 13, 2017 · 2 comments

Comments

@iblislin
Copy link
Member

Hi,
We encounter segfault with stn.
Here is the original issue
dmlc/MXNet.jl#369.

TL;DR:
Segfault happened in CPU-version mshadow::BilinearSamplingBackward

gdb trace here:

Thread 37 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff35a15700 (LWP 13819)]
0x00007fff83e77ba0 in mshadow::BilinearSamplingBackward<float> (input_grad=..., grid_src_data=..., output_grad=..., 
    input_data=...) at src/operator/spatial_transformer.cc:120
120                   *(g_input + data_index + 1) += *(grad + grad_index) * top_left_y_w
(gdb) bt
#0  0x00007fff83e77ba0 in mshadow::BilinearSamplingBackward<float> (input_grad=..., grid_src_data=..., output_grad=..., 
    input_data=...) at src/operator/spatial_transformer.cc:120
#1  0x00007fff83e5f18c in mxnet::op::SpatialTransformerOp<mshadow::cpu, float>::Backward (this=0x38bcd30, ctx=..., 
    out_grad=std::vector of length 1, capacity 1 = {...}, in_data=std::vector of length 2, capacity 2 = {...}, 
    out_data=std::vector of length 3, capacity 3 = {...}, req=std::vector of length 2, capacity 2 = {...}, 
    in_grad=std::vector of length 2, capacity 2 = {...}, aux_args=std::vector of length 0, capacity 0)
    at src/operator/./spatial_transformer-inl.h:136
(gdb) p grad
$1 = (const float *) 0x7fff251e6f90
(gdb) p top_left_y_w
$2 = 0.376614928
(gdb) p grad_index
$3 = 0
(gdb) p *(grad + grad_index)                                                                                              
$4 = 0.00177509966
(gdb) p g_input + data_index + 1
$5 = (float *) 0x80032442cf50
(gdb) p g_input
$6 = (float *) 0x7fff2442cf50
(gdb) p data_index
$7 = 4294967295

actually data_index become a negative number.

Also, segfault can reproduce in Python's example (with 1.0 prebuilt binary) (dmlc/MXNet.jl#369 (comment))

./train_mnist.py --network lenet --add_stn --optimizer adam
@sami-badawi
Copy link

I get the this segfault:

Segmentation fault: 11

when running C++ version of
cpp-package/example/lenet

This is where the segfault is thrown:

    Symbol conv1 =
        Convolution("conv1", data, conv1_w, conv1_b, Shape(5, 5), 20);

I have built it on a OS X 10.13.2
I disabled as many libraries as possible.

I have been able to run Python version of lenet when I installed it with pip.

@haojin2
Copy link
Contributor

haojin2 commented Jul 20, 2018

@iblis17 Can you still reproduce the error with the latest code? I've tried out the python reproduction and verified this should be fixed already. If you can confirm that this bug is no longer appearing on your side would you mind closing the issue? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants