You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Log shows that lpu_0 gpot data sent to lpu_1 (specifically, 'initV's from n_dict_0) contain newline characters "\n" in an array of floats. I also ran 2 LPUs on 1 GPU.
2017-01-31T18:16:04Z:INFO:man |connecting modules lpu_0 and lpu_1
2017-01-31T18:16:04Z:INFO:man |updating routing table with pattern
2017-01-31T18:16:04Z:INFO:man |connected modules lpu_0 and lpu_1
2017-01-31T18:16:05Z:INFO:man |sending steps message (10000)
2017-01-31T18:16:05Z:INFO:man |sending start message
2017-01-31T18:16:05Z:INFO:prc 1 |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:prc 0 |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:mod lpu_0 |running code before body of worker 0
2017-01-31T18:16:05Z:INFO:mod lpu_0 |extracting output ports for lpu_1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |running code before body of worker 1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |extracting input ports for lpu_0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running body of worker 0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running body of worker 1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_0 |gpot data sent to lpu_1: [-0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214]
[archiso:15797] *** Process received signal ***
[archiso:15797] Signal: Segmentation fault (11)
[archiso:15797] Signal code: Invalid permissions (2)
[archiso:15797] Failing at address: 0xb016e0a00
[archiso:15797] [ 0] /usr/lib/libpthread.so.0(+0x11080)[0x7f2488111080]
[archiso:15797] [ 1] /usr/lib/libc.so.6(+0x128855)[0x7f2487e8a855]
[archiso:15797] [ 2] /usr/lib/openmpi/openmpi/mca_btl_vader.so(mca_btl_vader_sendi+0x186)[0x7f247adddba6]
[archiso:15797] [ 3] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(+0x80f6)[0x7f247a52a0f6]
[archiso:15797] [ 4] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x3fd)[0x7f247a52a95d]
[archiso:15797] [ 5] /usr/lib/openmpi/libmpi.so.12(MPI_Isend+0x2ba)[0x7f248459d28a]
[archiso:15797] [ 6] /usr/lib/python2.7/site-packages/mpi4py/MPI.so(+0xcc861)[0x7f24848e5861]
[archiso:15797] [ 7] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5f30)[0x7f2488407c60]
[archiso:15797] [ 8] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [ 9] /usr/lib/libpython2.7.so.1.0(+0x7329d)[0x7f248839029d]
[archiso:15797] [10] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x52)[0x7f2488369692]
[archiso:15797] [11] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3509)[0x7f2488405239]
[archiso:15797] [12] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [13] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5fd2)[0x7f2488407d02]
[archiso:15797] [14] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [15] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [16] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [17] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [18] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x28)[0x7f248840b9e8]
[archiso:15797] [19] /usr/lib/libpython2.7.so.1.0(+0x108efe)[0x7f2488425efe]
[archiso:15797] [20] /usr/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x81)[0x7f24884271c1]
[archiso:15797] [21] /usr/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xf4)[0x7f24884284e4]
[archiso:15797] [22] /usr/lib/libpython2.7.so.1.0(Py_Main+0xce0)[0x7f248843aca0]
[archiso:15797] [23] /usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f2487d82291]
[archiso:15797] [24] /usr/bin/python2(_start+0x2a)[0x55baed2517ea]
[archiso:15797] *** End of error message ***
2017-01-31T18:16:06Z:INFO:mod lpu_1 |sent all data from lpu_1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |receiving from lpu_0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 15797 on node archiso exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
The text was updated successfully, but these errors were encountered:
Log shows that lpu_0 gpot data sent to lpu_1 (specifically, 'initV's from n_dict_0) contain newline characters "\n" in an array of floats. I also ran 2 LPUs on 1 GPU.
The text was updated successfully, but these errors were encountered: