You must be signed in to change notification settings - Fork 890
- Dialup Info: (Do not post to public mailing list or public wiki)
- Brad Benton
- Edgar Gabriel
- Geoffroy Vallee
- George
- Howard
- Josh Hursey
- Nathan Hjelm
- Ralph
- Ryan Grant
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2
- mpirun hangs on ONLY SLES 12. Minimum 40 procs/node. at very end of mpirun. Only seeing it in certain cases. Not sure what's going on.
- Is mpirun not exiting because ORTED not exiting? Nathan saw this on 2.0
- wait for Paul Hardgrove.
- No objections for Ralph shipping 1.10.2
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- Group Comms weren't working for Comms of powers of 2. Nathan found massive memory issue.
https://github.com/open-mpi/ompi/issues/1252 - Nathan working on a decay function for progress functions to "fix" this.
- Nathan's been delayed until later this week. Could get done by middle of next week.
- George commented that openib btl specificly could be made to only progress if there is a send/recv message posted.
- ugeniee progress - could only check for data grams every (only 200ns hit).
- Prefer to stick with nathan's original decay function without modifying openib.
https://github.com/open-mpi/ompi/issues/1225 - Totalview debugger problem + PMPI-x.
- SLURM users use srun, doesn't have this issue.
- DDT does NOT have this issue either. Don't know why it's different. Attach FIFO.
- mpirun waits on a pipe for debugger to write a 1 on that pipe.
- Don't see how that CAN work.
- Nathan's been using attach, rather than mpirun --debug. Attach happens after launch, so then it's not going through this step. Nathan thinks not so critical since attach works.
- Anything will work, as long as you're ATTACHING to a running job, rather than launching through debugger.
- Barring a breakthrough with PMI-x notify in next week. We'll do an RC2 and just carfully document what works/doesn't as far as debuggers.
- Will disable "mpirun --debug" and print an error on 2.0 branch that says it's broken.
- No longer a blocker for 2.0.0 due to schedule. Still want to fix this for next release.
- No new features (except for
- Howard will review
- review group comm
- don't know if we'll bother with pls filesystem.
- UXC using Modex stuff.
- OMPI-IO + Luster slow on 2.0.0 (and master) branches. Discussed making ROMIO default for OMPI on Luster (only).
Bunch of failures on Master branch. No chance to look at yet.
Cisco and Ivy cluster.
Nathan's seeing a resource deadlock avoided on OMPI Waitall. Some TCP BTL issue. Looks like something going on down there. Should be fairly easy to test this. Cisco TCP one-sided stuff.
- Nathan will see if he can figure this out. Haven't changed one-sided pt2pt receintly. Surprised. Maybe proclocks on by default? Need to work this out. Just changed locks from being conditional to being unconditional.
Edgar found some luster issues. OMPI master, has bad MPI-IO performance on luster. Looked reasonable on master, but now performance is poor. Not completely sure when get performance
- Luster itself, could switch back to ROMIO for default.
- GPFS, and others will look good, but Luster is bad. Can't have OMPI-IO as default on Luster.
- Problem for 2.0.0 AND Master Branch.
https://github.com/open-mpi/ompi/issues/398 ready for Pull request
- Nathan - Should go to 2.1 (since mpull changes pushed to 2.1).
https://github.com/open-mpi/ompi/pull/1118 - mpull rewrite should be ready to go, but want George to look at make comments. Probably one of first 2.1 requests after into master.
https://github.com/open-mpi/ompi/pull/1296 - PMI-x - Spreading changes from PMI-x across non-PMI-x infrastructure. Is that OKay?
- This is just making changes in GLUE that is OMPI specific.
- Should go into 2.0.0. plugs leaks, but minor.. still good.
https://github.com/open-mpi/ompi/pull/1290 - OPAL HOTEL problem. Do we need to get this into 2.0 as well?
- Definately needs to go into 2.0! Jeff is using it in 1.10.
https://github.com/open-mpi/ompi/pull/1278 - Nathan might want to look at. Giles fixing derived datatypes in one-sided.
- Nathan says it looks okay. Perfectly reasonable to use two different sets of tags.
- Absolutely a 2.0.0 bug as well.
- Nathan will merge it, and open the PR.
- Mellanox - (via email update after the meeting)
We are just now preparing the patch to open a PR. We’ve just finished testing this morning and got the ‘OK’ from UCX folks to open a PR. Sorry for the delay, we just wanted to be sure all the ‘t’s were crossed and ‘I’s dotted before submission.
- https://github.com/open-mpi/ompi-release/pull/891
- Sandia - Ryan, working on getting some bug fixes for 2.0. No major issues
- Intel - Working on MTT re-write. Trying to track down error notification thing. not much cycles.
- re-writing client in python, and make it more pluggable. and extending feature set, to handle broader range of stages.
- Josh has been working on reporter side (last 6 months) with some students. Thinking about more flexible architecture.
- rest interface around database, to support Python, and more flexible javascript reporter. Hopefully get that to a stage where people can play with.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM