You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PnetCDF test/largefile/large_coalesce test returns the value of 0 for bytes that should have a non-zero value.
*** TESTING C large_coalesce for skip filetype buftype coalesce ------ 0 (at line 285): expect buf[1073741814]=97 but got 0
0 (at line 285): expect buf[1073741815]=98 but got 0
0 (at line 285): expect buf[1073741816]=99 but got 0
0 (at line 285): expect buf[1073741817]=100 but got 0
0 (at line 285): expect buf[1073741818]=101 but got 0
0 (at line 285): expect buf[1073741819]=102 but got 0
0 (at line 285): expect buf[1073741820]=103 but got 0
0 (at line 285): expect buf[1073741821]=104 but got 0
0 (at line 285): expect buf[1073741822]=105 but got 0
0 (at line 285): expect buf[1073741823]=106 but got 0
0 (at line 285): expect buf[1073741824]=107 but got 0
0 (at line 285): expect buf[1073741825]=108 but got 0
0 (at line 285): expect buf[1073741826]=109 but got 0
0 (at line 285): expect buf[1073741827]=110 but got 0
0 (at line 285): expect buf[1073741828]=111 but got 0
0 (at line 285): expect buf[1073741829]=112 but got 0
0 (at line 285): expect buf[1073741830]=113 but got 0
0 (at line 285): expect buf[1073741831]=114 but got 0
0 (at line 285): expect buf[1073741832]=115 but got 0
0 (at line 285): expect buf[1073741833]=116 but got 0
0 (at line 293): expect buf[2147483638]=65 but got 0
0 (at line 293): expect buf[2147483639]=66 but got 0
0 (at line 293): expect buf[2147483640]=67 but got 0
0 (at line 293): expect buf[2147483641]=68 but got 0
0 (at line 293): expect buf[2147483642]=69 but got 0
0 (at line 293): expect buf[2147483643]=70 but got 0
0 (at line 293): expect buf[2147483644]=71 but got 0
0 (at line 293): expect buf[2147483645]=72 but got 0
0 (at line 293): expect buf[2147483646]=73 but got 0
0 (at line 293): expect buf[2147483647]=74 but got 0
0 (at line 293): expect buf[2147483648]=75 but got 0
0 (at line 293): expect buf[2147483649]=76 but got 0
0 (at line 293): expect buf[2147483650]=77 but got 0
0 (at line 293): expect buf[2147483651]=78 but got 0
0 (at line 293): expect buf[2147483652]=79 but got 0
0 (at line 293): expect buf[2147483653]=80 but got 0
0 (at line 293): expect buf[2147483654]=81 but got 0
0 (at line 293): expect buf[2147483655]=82 but got 0
0 (at line 293): expect buf[2147483656]=83 but got 0
0 (at line 293): expect buf[2147483657]=84 but got 0
Apparently, this "naive" read code path within ROMIO does not support requests larger than 2GB.
This is running as a single process test:
#!/bin/bash
set -x
nodes=$SLURM_NNODES
procs=$(($nodes * 1))
export UNIFYFS_MARGO_CLIENT_TIMEOUT=70000
export UNIFYFS_CONFIGFILE=/var/tmp/unifyfs.conf
touch $UNIFYFS_CONFIGFILE
srun --overlap -n $nodes -N $nodes mkdir /dev/shm/unifyfs
export UNIFYFS_LOGIO_SPILL_DIR=/dev/shm/unifyfs
export UNIFYFS_CLIENT_LOCAL_EXTENTS=1
export UNIFYFS_CLIENT_WRITE_SYNC=0
export UNIFYFS_LOG_VERBOSITY=1
# test_ncmpi_put_var1_schar executes many small writes,
# it was necessary to reduce the chunk size to avoid exhausing space
export UNIFYFS_LOG_DIR=`pwd`/logs
export UNIFYFS_LOGIO_CHUNK_SIZE=$(expr 1 \* 4096)
export UNIFYFS_LOGIO_SHMEM_SIZE=$(expr 1024 \* 1048576)
export UNIFYFS_LOGIO_SPILL_SIZE=$(expr 0 \* 1048576)
export UNIFYFS_CLIENT_SUPER_MAGIC=0
installdir="/path/to/unifyfs.git/install"
export LD_LIBRARY_PATH="${installdir}/lib:${installdir}/lib64:$LD_LIBRARY_PATH"
# turn of darshan profiling
export DARSHAN_DISABLE=1
# sleep for some time after unlink
# see https://github.com/LLNL/UnifyFS/issues/744
export UNIFYFS_CLIENT_UNLINK_USECS=1000000
export LD_PRELOAD="${installdir}/lib/libunifyfs_mpi_gotcha.so"
filename="/unifyfs/testfile.nc"
export UNIFYFS_LOGIO_SHMEM_SIZE=$(expr 8192 \* 1048576)
cd test/largefile
./large_coalesce $filename
The text was updated successfully, but these errors were encountered:
adammoody
changed the title
PnetCDF largefile
PnetCDF largefile/large_coalesce fails after detecting invalid data during a read
Dec 28, 2022
adammoody
changed the title
PnetCDF largefile/large_coalesce fails after detecting invalid data during a read
PnetCDF large_coalesce test fails due to incorrect data on read (ROMIO problem)
Dec 28, 2022
The PnetCDF
test/largefile/large_coalesce
test returns the value of 0 for bytes that should have a non-zero value.That is reported around this line:
https://github.com/Parallel-NetCDF/PnetCDF/blob/c7e22c81ac4c2922f84281a4a19f7000079e6c3f/test/largefile/large_coalesce.c#L284
This same test throws a segfault when using Lustre as the file system, so the test failure is not unique to UnifyFS.
Tracing under a debug build of MVAPICH2, the test hits an ADIOI assertion at this line:
https://github.com/pmodels/mpich/blob/5b88f46620607707201768f4b3df39907082f344/src/mpi/romio/adio/common/ad_read_str_naive.c#L311
The value
req_len = 2147483126
fails the assertion checkreq_len == (int) req_len
.The stack trace at this point is:
Apparently, this "naive" read code path within ROMIO does not support requests larger than 2GB.
This is running as a single process test:
The text was updated successfully, but these errors were encountered: