-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance significantly affected by "flatness" of the compounds #757
Comments
It seems that if I set |
Yeah, I can see that issue, the |
Hey @chrisjonesBSU, thanks for this issue. These examples are nicely fleshed out. So the GMSO apply step takes advantage of a lot of assumptions about repeated structure in a topology. My guess is that for some reason, the polymer builder gets the residues correct in the flat molecule, and doesn't get it right in the twisted chain (looking just at example 2). You might gain a lot of information by printing out You can also see if the identify_connected_components is only good for lots of small molecules. Small numbers of large molecules is going to be inherently bad for it. For "most" cases it adds a lot of speedups. That's why it's the default. I can see a reason to remove that as the default though. For this specific case (a polymer with many sub monomers), the ideal method for applying is actually with this argument turned on: |
A difference between smiles and polymer builder is the overall packaging (i.e., hierarchy). Could you try using |
Thanks for the detailed responses. It seems like a case could be made for either default value for Whatever choice is made for the default value, maybe we add some warnings if possible, and definitely some more information to the doc strings? @chrisiacovella |
Also an added point is that use_molecule_info in the apply step claims it is unused in the docstring, but it actually is. In fact, you can turn off identify_connected_components if the molecules are all already labeled correctly in the topology, and you don't have to go down to compare the networkx graphs. Here is where it get's used, and here is the details about it in the parameterizer |
@marjanAlbouye and I were experiencing some very slow performance with
gmso.parameterization.apply
and have been doing some digging into what the problem could be.It doesn't seem to be an issue of number of particles or molecules, and given the examples below, the only thing I can think of is that flat, planar molecules, or in this case polymers, are causing issues.
I'm curious to see if anyone else can re-create this, or has any ideas of what the issue could be. Possibly something related to graphs and sub graphs as a function of topology size and structure?
Examples:
1. Creating a 20mer poly(ethylene) from SMILES compared to a 20mer using the Polymer Builder.
I don't think it is an issue specific to the Polymer builder.
Example 2: Poly(benzene) from the polymer builder
Here, I choose different values for the bonding indices. One gives a twisty chain, one gives a perfectly flat and planar chain. The twisty one finishes in a reasonable amount of time. The flat one never finishes.
I'm not super familiar with the graph/networkx operations GMSO is doing under the hood, but I wonder if there are some edge cases where you get stuck in an infinite while loop or for loop? For example, I have a bigger system than these examples (but not crazy big, only 13,000 particles) running on a cluster that is approaching 20 hours in the apply step (which is what prompted this issue). The polymer chains are totally flat like this benzene example.
The text was updated successfully, but these errors were encountered: