Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't perform independent write when MPI_File_sync is required by ROMIO driver. #1093

Open
nahaharo opened this issue Aug 5, 2023 · 4 comments
Labels

Comments

@nahaharo
Copy link

nahaharo commented Aug 5, 2023

Hello.
Recently I've been doing mpi job for extensive data processing.
However, I'm getting the "Can't perform independent write when MPI_File_sync is required by ROMIO driver." error from the log.

The symptom is like the following:

  1. works well in the local (master) machine.
  2. when runs on nodes, it gives the error "Can't perform independent write when MPI_File_sync is required by ROMIO driver.".
  3. with dxpl_mpio=:collective, it stucks at write.

The local machine is connected directly to a disk, which I will write the hdf5 file, and the remote device was connected to a disk with NFS.

My question is, why does that error appear?
Does it appear because it uses NFS?
If this is avoidable, then how?

And also, In the article "https://www.hdfgroup.org/2015/08/parallel-io-with-hdf5/" there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

Thanks.

Here is my test code.

using HDF5
using MPI

function main()
    @assert HDF5.has_parallel()

    MPI.Init()
    
    comm = MPI.COMM_WORLD
    info = MPI.Info()
    ff = h5open("test.h5", "w", comm, info)
    MPI.Barrier(comm)
    
    Nproc = MPI.Comm_size(comm)
    myrank = MPI.Comm_rank(comm)
    M = 10
    A = fill(myrank, M, 2)  # local data
    dims = (M, Nproc*2+1)    # dimensions of global data
    
    # Create dataset
    @show "Create dataset"
    dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims), chunk=(M, 2), dxpl_mpio=:collective)
    @show "After dataset"
    
    # Write local data
    dset[:, 2*myrank + 1:2*myrank + 2] = A
    @show "After write dataset"

    close(ff)
    
    MPI.Finalize()
end

main()

And my result of "MPIPreferences.use_system_binary()".

julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│   libmpi = "libmpi"
│   version_string = "MPICH Version:      4.1.2\nMPICH Release date: Wed Jun  7 15:22:45 CDT 2023\nMPICH ABI:          15:1:3\nMPICH Device:       ch4:ofi\nMPICH configure:    --prefix=/home/---/tools/mpich --with-ucx=/home/---/tools/ucx\nMPICH CC:           /home/---/tools/gcc/bin/gcc    -O2\nMPICH CXX:          /home/hyunwook/tools/gcc/bin/g++   -O2\nMPICH F77:          /home/---/tools/gcc/bin/gfortran   -O2\nMPICH FC:           /home/---/tools/gcc/bin/gfortran   -O2\n"
│   imply = "MPICH"
│   version = v"4.1.2"
└   abi = "MPICH"
┌ Info: MPIPreferences unchanged
│   binary = "system"
│   libmpi = "libmpi"
│   abi = "MPICH"
│   pieces = "mpiexec"
│   preloads = Any[]
└   preloads_env_switch = nothing

Run script (for sbatch)

#!/bin/bash
#SBATCH -J hdf5_test
#SBATCH -o stdout_log.o%j
#SBATCH -N 1
#SBATCH -n 32

mpiexec.hydra -np $SLURM_NTASKS julia test.jl

My Env

  • Centos7.5
  • slurm with hydra
  • HDF5 1.14.1
  • GCC 13.2.0
  • MPICH 4.1.2
    (Yes, I built HDF5, GCC, and MPICH from the source)
@mkitti
Copy link
Member

mkitti commented Aug 5, 2023

@simonbyrne might be best equipped to answer the overall question.

there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

We don't have a pregenerated binding for H5Sselect_none in HDF5.jl for this yet. Based on auto-generated bindings in LibHDF5.jl you could just invoke the ccall directly.

https://github.com/mkitti/LibHDF5.jl/blob/712b6e306a15de37f748727b37676aca70ea0664/src/LibHDF5.jl#L3816-L3818

julia> import HDF5.API.HDF5_jll: libhdf5

julia> import HDF5.API: herr_t, hid_t

julia> function H5Sselect_none(spaceid)
           ccall((:H5Sselect_none, libhdf5), herr_t, (hid_t,), spaceid)
       end
H5Sselect_none (generic function with 1 method)

julia> dspace = dataspace((1,1))
HDF5.Dataspace: (1, 1)

julia> H5Sselect_none(dspace)
0

julia> dspace
HDF5.Dataspace: (1, 1) [irregular selection]

@simonbyrne
Copy link
Collaborator

It could be that you are still using the HDF5 library linked against the bundled MPI library (i.e. not the system one).

You either need to specify it (currently you need to set JULIA_HDF5_PATH), or use MPItrampoline (which requires building a wrapper around your system MPI library)

@simonbyrne
Copy link
Collaborator

If that is not the case, does it work without the chunk option?

@nahaharo
Copy link
Author

nahaharo commented Aug 6, 2023

  1. System MPI library(MPICH, that was bulit from source) was used.
  2. JULIA_HDF5_PATH was set properly
  3. with or without chunk, in independent io mode, it still gives the same error

I think this error occurs because of NFS (based on this issue: https://forum.hdfgroup.org/t/hang-for-mpi-hdf5-in-parallel-on-an-nfs-system/6541/3)
It looks like now collective mode is working. So I'm going for it.

@simonbyrne simonbyrne added the MPI label Sep 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants