-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward flux sampling crashing when N_walker < N_procs #10
Comments
Hi Oliver, I haven't tried to reproduce your error using your input files yet, but one thing that has caught my eye is that you have a linker error in your
Also, the output of the following commands would be helpful to understand if it is a compiling/linking problem or a bug in the source code:
|
Hi Julien, Many thanks for your response. I posted earlier today, but deleted the comment as I've got an update. There is definitely a problem with macports and its openmpi library. When using the standard library (e.g. openmpi-devel-default) I get even a fatal error during the linking stage and no executable. This is why I decided to move to our local HPC system where openmpi and gcc were compiled and installed from source. Although no warning or error occurs now during the build process (see I attach files camke_config_archie.txt and build_archie.txt) the error is persistent. I attach as well the stdout from the runs with 2 walkers on 2 processes (slurm-313283.out) and 1 walker on 2 processes (slurm-313284.out). The output of the above commands is
Let me know if there is anything else I can do. Best wishes, cmake_config_archie.txt |
Hi Oliver, the first thing I noticed is that there is a problem in the
This command leads to inconsistent linker flags and, depending on how strict your linker is, could lead to an error. Maybe this will fix the compilation on your Mac. Also, I made a typo in my previous comment. Pleas give the output of
(with two d). Concerning the error |
Hi Julian, Removing the above line in Regarding the output of
In the example directory
|
Unfortunately, the current FFS implementation is set up for 1 MPI process per walker. This is a definite shortcoming of this method at this time, and we will most likely be overhauling FFS at some point to make it more rigorous and able to handle more use cases. For some "large" cases, we've seen the method fail to calculate the rate and committor probability automatically. In addition, the filenames for the "failed" trajectories are numbered in a cumulative way, instead of numbered by interface. |
Dear SSAGES Developers,
I started with the FFS tutorial in /Examples/User/ForwardFlux/LAMMPS/Langevin/ and get a fatal error with missing atoms when the number of processors is larger (but divisible) by the number of walkers, so e.g. 1 walker on 2 processes.
The error occurs because of line #354 in /src/Methods/ForwardFlux.cpp. The default atom index, set in line #323 to be -1, is not overwritten, so is negative as the atom cannot be found.
There is a comment in line #342 and below reading
//FIXME: try using snapshot->GetLocalIndex()
//copied from Ben's previous implementation
Does this suggest this has been taken from another implementation and perhaps doesn't work as expected?
I attach a zip archive with all input files, etc for a run with 1 walker on 2 MPI-tasks, which should allow you to reproduce the issue. I modified the example in the tutorial in a logical way to decrease the number of walkers from two to one. I don't think I made a mistake here, but please check this first. It is not obvious to me what could have gone wrong.
I also attach the stdout from the configuration and installation step, which should allow you to see which MPI-library etc I have used. I could reproduce the issue on a completely different system. In all instances LAMMPS works fine in parallel and SSAGES was built with a copy of that distribution.
I'm happy to help with the fix, but would require more input as I'm obviously not very familiar with the code. I understand how difficult it is to find the time to help others as I'm doing the same on a number of projects. So your help and time is very much appreciated.
Best wishes,
Oliver
1walker2proc.zip
cmake_config.txt
build.txt
The text was updated successfully, but these errors were encountered: