Skip to content

daisy-chained example #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Apr 6, 2023
Merged

daisy-chained example #222

merged 56 commits into from
Apr 6, 2023

Conversation

gelzinyte
Copy link
Contributor

This is (an unpolished draft of) the long daisy-chained workflow example that involves quite a few different operations and different ways to interact with them.

@simonwengert, @bernstei

  • what do you think in general?
  • Any parts to skip or parts to add?
  • Any better unit-tests than just checking for presence of files?
  • Originally, this showed how to set remote execution via environment variable and python. Currently I have left the in-python remote execution, but am not actually using it in the end for github-workflow purposes. Any better way to showcase this?
  • I left gap fitting in, but now I think that ACE fitting would be better, since we already have an iterative-fitting GAP example?
  • Do you think much more explanation is needed or can/should we treat this more as a copy-and-paste-code-example-pool and expect people to have a sense of what's happening from the rest of documentation? (I would prefer the less verbose later.)

@simonwengert thanks for working out how to document and test jupyter notebooks - writing this example was much easier than the old ones!

@gelzinyte
Copy link
Contributor Author

Note to myself on to do: come back to the clumsy piece of code that calculates atomization energies. I would like to be able to overwrite an existing file rather than creating new ones. Need to check if PR #210 allows to do this or if more modifications would be needed.

@simonwengert
Copy link
Contributor

simonwengert commented Feb 3, 2023

what do you think in general?

In general I really like this long-ish example. It's well structures and in my opinion its easy to understand what's going on and why.

Any parts to skip or parts to add?

We could save one part of the example by shifting the Filter out unreasonable structures into the Run Molecular Dynamics simulation part by directly selecting the structures during the MD run. In addition to saving one part, we then also showcase the traj_select*-functionality for MD runs. And we still have a representative of wfl.select due to the CUR sampling below.

Any better unit-tests than just checking for presence of files?

When initializing the MD with a defined set of velocities, is the rest of the example then deterministic? If this is the case, we could check if the obtained errors at the end are as expected.

I left gap fitting in, but now I think that ACE fitting would be better, since we already have an iterative-fitting GAP example?

Indeed, having an ACE example would be nice. Either with this example or a separate one. Since it's a long example and we don't want to lose people on the way GAP might be a preferred choice. I guess there are more people that have some experience with GAP then with ACE.

@gelzinyte
Copy link
Contributor Author

We could save one part of the example by shifting the Filter out unreasonable structures into the Run Molecular Dynamics simulation part by directly selecting the structures during the MD run. In addition to saving one part, we then also showcase the traj_select*-functionality for MD runs. And we still have a representative of wfl.select due to the CUR sampling below.

I like this, will add!

When initializing the MD with a defined set of velocities, is the rest of the example then deterministic? If this is the case, we could check if the obtained errors at the end are as expected.

That would mean loading the trajectory start from file, which I think doesn't (by default) store enough significant digits. Also quite like that the tutorial doesn't depend on any files. But I agree, it would be nice to check the errors - maybe add a very loose assert on the RMSEs?

@bernstei
Copy link
Contributor

bernstei commented Feb 3, 2023

Why not just set the random seed?

" # set the cell for gap_fit\n",
" at.cell = [50, 50, 50]\n",
"\n",
" # gap_fit cannot parse the xyz correctly with the SOAP descriptor\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, it just complains about malformed xyz:

SYSTEM ABORT: Traceback (most recent call last)
File "/project/src/libAtoms/xyz.c", line 281 kind IO
xyz_find_frames: malformed XYZ file /tmp/206641.1.any/_GAP_fitting_configs.bh4j0b_j.xyz at frame 1

STOP 1

The same file has dipoles in the same format as soap (string of floats), so could it exceed character limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the file? I could take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that'd would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - the immediate problem is that the header line is too long. That's trivial to fix in libAtoms, but it doesn't actually help, because the fortran xyz reader only allows info float fields to be length 1, 3, 6, or 9. I think the new xyz reader fixes that, but I don't know how close we are to using it. As a result, it chokes on the SOAP field either way. I think we need to clean up the descriptor in this example, but I'll try to push QUIP to fix this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer reporting is very helpful, thanks.

It seems that difference is because of different architectures gap_fit has been run on? The gap coefficients and predictions vary tiny bit from fit to fit, even with a given random seed. Also I get slightly different results when running notebook directly vs with pytest (which I didn't expect?). We could have CI values as reference, but then local tests would fail, so I'll relax the tolerance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the difference between local notebook and local pytest starts with RDKit geometries, even if they're generated the same whenever the notebook is re-run. I don't think I can set a random seed (Python RDKit is a wrapper to C++). Since this is mostly testing the workflow, I think very loose tolerance (0.1 eV or eV/Å) is the optimal choice? This should pick out drastic breaking changes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sometimes you need loose tolerances, or even testing some other (aggregate or averaged) quantity if you can't control some randomness. We're having a bad time with that in ACEHAL, for reasons that I understand even less.

If you can reduce generating the initial non-deterministic configuration it to something simple, I can take a look as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried reading initial structures from file, but MD wasn't deterministic with set random seed. Or at least the notebook and pytest MD trajectories were different, didn't check if they were deterministic from run to run. I'm guessing that there's not enough significant figures to fill a double saved in xyz and the rest is filled with noise?

Could try saving/loading the initial structures from e.g. ase.db, but not sure that's worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MD isn't deterministic if you're doing any parallelization unless you set the env var WFL_DETERMINISTIC_HACK to anything (which the gap rss iter fit test does, but isn't done across the board). I guess we should consider fixing the non-determinism itself if possible, which I think I know how to do but it'd be tricky.

If that's the only issue, you could try setting that env var.

@bernstei bernstei force-pushed the gap_fit_wfl_example branch from 99365e4 to 39112ea Compare March 8, 2023 15:21
@gelzinyte
Copy link
Contributor Author

Now everything is deterministic and error check passes with a default (tighter than before) threshold. Can this be merged @bernstei ?

@bernstei
Copy link
Contributor

bernstei commented Apr 6, 2023

I'm happy to merge, yes.

@gelzinyte gelzinyte merged commit 39eca4a into main Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants