Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Chemical System #616

Open
wants to merge 89 commits into
base: protos
Choose a base branch
from

Conversation

MBartkowiakSTFC
Copy link
Collaborator

@MBartkowiakSTFC MBartkowiakSTFC commented Nov 29, 2024

Description of work
Most of the classes derived from ChemicalEntity were only used for a very specific case of a trajectory created by GROMACS with a PDB file describing the topology. Since MDANSE is moving to MDAnalysis for loading GROMACS trajectories, the selection mechanism based on fixed databases needs to be replaced with a more general solution. This is provided by rdkit and substructure matching. Since rdkit introduces its own indexing of atoms, not guaranteed to be consistent with the internal indexing of the ChemicalSystem class, changes are necessary to improve the consistency of the atom indexing.

closes #545
closes #351
closes #641
closes #478

Fixes

  1. ChemicalEntity and all derived classes have been removed from the code.
  2. The only database remaining in the Chemistry section is the atoms database.
  3. Atom indexing is now fixed and matches the indexing in the coordinate array.
  4. New ChemicalSystem implementation stores lists of indices.
  5. ChemicalSystem does not have a "configuration" attribute anymore. Configuration is accessed only via the Trajectory class.
  6. A backup ChemicalSystem loader allows the users to load old trajectories. The new ChemicalSystem instance will be created in memory and populated with information from the old /chemical_system data structure.
  7. The trajectory stores a subset of the atom database, preserving the user-made changes to atom definitions and making trajectories portable even if they use artificial atoms.
  8. Atom selection works for trajectories independent of the molecular dynamics engine used to run the simulation. However, it is necessary to detect bonds in the system using the TrajectoryEditor job before it is possible to select structures.
  9. Most of the Cython extensions have been replaced with Python equivalents.

To test
All unit tests must pass.
A detailed comparison of the results between this and main branches is necessary due to the fundamental changes to the way the code operates.
Please run TrajectoryEditor with molecule detection on a trajectory and check if all the expected Atom Selection options are working correctly in the GUI.

Jobs that require the most attention:

  • CenterOfMassTrajectory
  • Voronoi
  • SolventAccessibleSurface
  • PairDistributionFunction
  • VanHoveSelf
  • VanHoveDistinct

@ChiCheng45
Copy link
Collaborator

I'm getting some odd trajectories after running the trajectory editor with default values so no changes to the trajectory should occur. I'm using the NAMD dmpc_in_water trajectory.

Before
image

After (this branch)
image

After (protos)
image

@MBartkowiakSTFC
Copy link
Collaborator Author

COM trajectory to work well and analysis calculation with COM trajectory also seems to be working fine.

Just a few things about COM atoms and the 3D viewer. I ran the the COM trajectory on the discover cubane/buckyball system.

the atom radii are maybe too big? We also need to block the bonding between COM atoms similarly to dummy atoms, here is what I see when I set a smaller atom radius.

We also need to think about what the radius setting should mean exactly because it could mean vdw radii or covalent radii or something else.

Screenshot 2025-01-20 at 16 05 38
In the end I decided to define the radius of a COM atom as the SUM of two terms:

  1. the mass-weighted average of the atom distances from the centre of mass,
  2. the mean covalent radius of the atoms in the groups.
    The atom size scaling is still less than 1 (0.4 by default, so atoms don't block the view). If I change the atom scaling to 1.0, for the fullerene/cubane system I get the picture above.

@MBartkowiakSTFC
Copy link
Collaborator Author

I'm getting some odd trajectories after running the trajectory editor with default values so no changes to the trajectory should occur. I'm using the NAMD dmpc_in_water trajectory.

Two problems had to be addressed:

  1. MinimalPDBReader was grouping molecules wrong, adding a water molecule to a chain,
  2. contiguous_configuration normally shifts atoms to minimise their distance from the first atom in the group. I added an option to group them around the centre of the group instead. (Not the centre of mass, just the geometric centre). For now I made TrajectoryEditor use that option, as it is less likely to break long chains. We can discuss if we should always use this option or not. For now I left it as "false" to stay compatible with the old code.
    If you convert the trajectory again and run a null trajectory editor, nothing should change.

@ChiCheng45
Copy link
Collaborator

I'm not sure how this happened since the PDF function is the same on protos and this branch but there are some issues wth the SSF calculation on this branch. This worked before, I think d500f10 should be the first bad commit.

image

image

@MBartkowiakSTFC
Copy link
Collaborator Author

I'm not sure how this happened since the PDF function is the same on protos and this branch but there are some issues wth the SSF calculation on this branch. This worked before, I think d500f10 should be the first bad commit.

I completely missed that SSF also inherits from DistanceHistogram. PDF needed a change of one scaling factor in this PR. Now SSF has it too. For the trajectories I checked, the results matched those from protos.

@ChiCheng45
Copy link
Collaborator

Some differences between the RMSD results on protos and this branch.

image

Also the RMSD calculation is quite a bit slower on this branch.

@ChiCheng45
Copy link
Collaborator

RMSF job on this branch does not give the correct number of data points with the DFTB water trajectory and the molecule group selection.

On this branch there are only two data points, there should be 100 since there are 300 atoms. Also even with the atom group selection the results are different.

image

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Jan 24, 2025

Here is the EISF for the VASP trajectory, protos and this branch produce the same results.

image

Here is the EISF for the DFTB trajectory, protos and this branch produce different results.

image

I made sure the set the seed to 1 so they should generate the same q-vectors. I think there is some issue with trajectories with atoms that go outside of the unit cell for some calculations.

DFTB - some atoms go outside the unit cell
image

VASP trajectory - all atoms stay inside the unit cell
image

I think this is whats causing the RMSD and RMSF issues above. I used the VASP trajectory to test quite a few of the MDANSE jobs so there are probably a few more that do not work with the DFTB trajectory.

@MBartkowiakSTFC
Copy link
Collaborator Author

I have compared the results from protos to the current branch. It was correct for:

  • EISF, DISF
  • MSD atoms and molecules grouping,
  • RMSD, RMSF,
  • DOS, VACF, PACF
  • angular correlation,
  • infrared.
    DFTB water and LAMMPS CuSbS trajectories were used in the tests, since one has molecules going out of the box, and the other has clear jumps across the simulation boundary.

I think that the read_com_trajectory method in MdanseTrajectory was the main source of problems before, and it has been fixed now.

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Jan 29, 2025

I've done some more testing and it looks like there might be some issues with loading h5md files in this branch. I'm using MDANSE\Tests\UnitTests\Data\Ar_mdmc_h5md.h5.

When I load the h5md file I also get the following error.

"KeyError: "Unable to synchronously open object (object 'chemical_system' doesn't exist)""

but it seems to continue to work anyway.

Anyway here are the following results for PACF, RMSD, and RMSF.

image
image
image

@MBartkowiakSTFC
Copy link
Collaborator Author

I've done some more testing and it looks like there might be some issues with loading h5md files in this branch. I'm using MDANSE\Tests\UnitTests\Data\Ar_mdmc_h5md.h5.

When I load the h5md file I also get the following error.

"KeyError: "Unable to synchronously open object (object 'chemical_system' doesn't exist)""

but it seems to continue to work anyway.

True, this message was not really meant to be this dramatic. Now there is just a warning that MdanseTrajectory class could not read the file, so we will try H5MD instead.

Anyway here are the following results for PACF, RMSD, and RMSF.

I corrected it, now the results match for me.

@ChiCheng45
Copy link
Collaborator

I think that all the problems you highlighted so far have been fixed in the latest version of this PR. Also, now the MDTraj converter checks for bonds in the topology files, and uses them to identify molecules.

Thanks, yes looks good as far as I can tell all the converters are working correctly now.
One bug I've noticed is that when you delete an open trajectory from the GUI, the file still appears to be locked so I can't delete or move the file in Windows. Previously we were able to do this and we can still do this with the MDA files.

Looks like this is fixed now.

This is fixed when you load an already converted trajectory but I found one more edge case. When you convert with autoload and then delete it from the GUI the trajectory file is still locked so it can't be deleted or moved.

@MBartkowiakSTFC
Copy link
Collaborator Author

This is fixed when you load an already converted trajectory but I found one more edge case. When you convert with autoload and then delete it from the GUI the trajectory file is still locked so it can't be deleted or moved.

When does it go away? Do you have to switch off the GUI, or is it enough just to wait?

As far as I know this issue is Windows-specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants