Skip to content

Electrode workflow and documentation improvements #1055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 24, 2025

Conversation

esoteric-ephemera
Copy link
Collaborator

@esoteric-ephemera esoteric-ephemera commented Nov 12, 2024

Workflow changes with @jmmshn:

  • Small change in the electrode insertion workflow to avoid needing large object storage: Currently there's a get_charge_density_job function which does not point its output to e.g. GridFS. That causes problems when trying to store a CHGCAR in normal mongo stores
    • Moved the get_charge_density_job within get_inserted_structures so that the charge density is not stored when generating inserted structures (it still may be stored in the user's GridFS from the static calc)
    • Improvements to labeling of workflow steps

Docs changes:

Misc:

@JaGeo
Copy link
Member

JaGeo commented Nov 12, 2024

@esoteric-ephemera does it make sense to add the data to tge datastore for other parts of the workflow? It will surely fail at some point

@esoteric-ephemera
Copy link
Collaborator Author

It should work in this setup because the charge density is only used in one part of the flow (get_inserted_structures). If the user wanted to retrieve their charge densities for analysis, the host structure static calculation should store them in the datastore as well (this static is required in the electrode flow)

Maybe @jmmshn has thoughts on this - don't want to break the flow logic

@@ -204,7 +202,8 @@ def get_insertion_electrode_doc(

@job
def get_inserted_structures(
chg: VolumetricData,
prev_dir: Path | str,
get_charge_density: Callable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_charge_density: Callable,
get_charge_density: Callable[[str | Path, VolumetricData]

@@ -213,7 +212,8 @@ def get_inserted_structures(

Parameters
----------
chg: The charge density.
prev_dir: The previous directory where the static calculation was performed.
get_charge_density: A function to get the charge density from a task document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_charge_density: A function to get the charge density from a task document.
get_charge_density: A function to get the charge density from a run directory.

@jmmshn
Copy link
Contributor

jmmshn commented Nov 14, 2024

@esoteric-ephemera, copying the file over was kind of a fireworks-related hack.
When this was added, whenever there was a large object that was the input of the job. The UI that rendered the fireworks wf would try to render the entire thing which basically crashed it every time. So you might want to confirm that this is no longer the case.

@esoteric-ephemera
Copy link
Collaborator Author

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 4.13%. Comparing base (4244da9) to head (59b7f26).

Files with missing lines Patch % Lines
src/atomate2/common/jobs/electrode.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #1055       +/-   ##
==========================================
- Coverage   72.82%   4.13%   -68.69%     
==========================================
  Files         187     187               
  Lines       13637   13634        -3     
  Branches     1370    1370               
==========================================
- Hits         9931     564     -9367     
- Misses       3161   13039     +9878     
+ Partials      545      31      -514     
Files with missing lines Coverage Δ
src/atomate2/common/jobs/electrode.py 0.00% <0.00%> (-84.54%) ⬇️

... and 166 files with indirect coverage changes

@jmmshn
Copy link
Contributor

jmmshn commented Jan 7, 2025

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

@esoteric-ephemera , it looks like current version of get_charge_density_job

def get_charge_density_job(

breaks for large objects because it need to send the entire charge density to the jobstore.
I think this can be fixed by adding data=True but we will run into the same problem with jobflow + Firework,
since you now have the following charge density in the input:

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/common/jobs/electrode.py#L97C9-L97C16

So I don't see any way of not causing the problem with fireworks where the object is deserialized during viewing of the job in the webgui. Maybe it's best to move on completely from fireworks anyways.

So I think the choices are:

  1. Go back to the old implementation where the charge density is never "sent" between jobs so it never appears as in input.
  2. Just forget about any problems with view in fireworks... I have not used it in months and month and we should just worry about jobflow-remote.

I think we are already down the path of 2 mostly anyways so I can just complete refactor out the get_charge_density_job and send the charge density from static calculation directly.

@esoteric-ephemera
Copy link
Collaborator Author

Hey @jmmshn, the way I've rewritten it in the PR shouldn't hit either issue you've identified - it just slightly tweaks the electrode flow to take the path of the charge density file and a function as args. Should not have issues with storing large objects in Mongo, nor with rendering large objects in the fireworks webgui

We're still using fireworks, and I don't see a strong case to store the charge density in a database. Even in that case, the user can write a custom get_charge_density function that pulls from a database directly

@jmmshn
Copy link
Contributor

jmmshn commented Jan 7, 2025

So in my testing the following job fails with a "Document too large" error.

def get_charge_density_job(
prev_dir: Path | str,
get_charge_density: Callable,
) -> VolumetricData:
"""Get the charge density from a task document.
Parameters
----------
prev_dir: The previous directory where the static calculation was performed.
get_charge_density: A function to get the charge density from a task document.
Returns
-------
The charge density.
"""
return get_charge_density(prev_dir)

My current understanding of what is happening is that:

get_charge_density_job calls the VASP version of the function:

def get_charge_density(self, prev_dir: Path | str) -> VolumetricData:
"""Get the charge density of a structure.
Parameters
----------
prev_dir:
The previous directory where the static calculation was performed.
Returns
-------
The charge density.
"""
prev_dir = Path(strip_hostname(prev_dir))
aeccar0 = Chgcar.from_file(prev_dir / "AECCAR0.gz")
aeccar2 = Chgcar.from_file(prev_dir / "AECCAR2.gz")
return aeccar0 + aeccar2

So it will produced a document that contains a VolumetricData/Chgcar object.

It looks like the fact that this result does not have a data=[Chgcar, VolumetricData] in the decorator is the reason I'm seeing the failures.

PS: Thanks for the fast reply!

@esoteric-ephemera
Copy link
Collaborator Author

Yeah exactly what I've observed - adding Chgcar and VolumetricData is to the data field of job should work as well, but is there a need for storing the charge density?

@jmmshn
Copy link
Contributor

jmmshn commented Jan 8, 2025

but is there a need for storing the charge density?
I think that is for the case where the different nodes cannot see the same file system?
As I understand, you will have to store the charge density somewhere to be able to send it to the next job.
That might be a somewhat common usage case if people use more heterogenous compute which is where I think jobflowremote is going.

That is essentially the tradeoff, this is a substantial amount of new data though so it might make sense to make

  • "copy file over" the default behavior
  • "send object via jobstore" an optional behavior

@esoteric-ephemera
Copy link
Collaborator Author

OK - so in the case of heterogeneous compute for a VASP job, we'd have to use something like static_job.output.vasp_objects["aeccar0"] to access the AECCAR0 without making a direct query to the job store. I haven't tried to see if that's possible but I would expect it is?

Either way, I think this is worth looking at in a separate PR - the flow as it currently exists in atomate2 doesn't support heterogeneous compute, this PR is really just to ensure that the flow doesn't fail from mongodb object size limitations

@jmmshn
Copy link
Contributor

jmmshn commented Jan 9, 2025

OK everything makes sense now.
I agree. I think I was a bit confused cuz I thought the thing was already removed and somehow made it's way back in.
But I I now see that this PR was not merged. Sorry for the confusion on my part!

@jmmshn
Copy link
Contributor

jmmshn commented Jan 10, 2025

@esoteric-ephemera, I added some minor fixes to the job naming/numbering and I try help get this PR passed.

Thanks for fixing this and shelping me clear up my confusion. I think I made a mistaking while writing the original wf and missed a place where the data is serialized to the jobstore. I'll fix those among other things once this PR is merged!

@esoteric-ephemera
Copy link
Collaborator Author

No worries @jmmshn, that all sounds good to me

@jmmshn
Copy link
Contributor

jmmshn commented Jan 28, 2025

Hi @esoteric-ephemera, we fixed the abinit tests, I think this should work now.

@esoteric-ephemera
Copy link
Collaborator Author

Hey @JaGeo , @janosh , @utf , should be ready for review. Thanks @jmmshn for input on this!

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the expert here but looks good to me 👍

pyproject.toml Outdated
@@ -35,6 +35,7 @@ dependencies = [
"pydantic-settings>=2.0.3",
"pydantic>=2.0.1",
"pymatgen>=2024.11.13",
"pymongo<=4.10.1", # TODO: remove pending maggma fix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe link the relevant maggma issue/PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple mistakes here:

  • maggma actually requires pymongo==4.10.1 but 4.11 was getting installed in CI
  • The issue is mongomock missing a sort kwarg that was recently added in pymongo

Will update this

@esoteric-ephemera esoteric-ephemera changed the title Change how electrode flow retrieves charge density data Electrode workflow and documentation improvements Apr 23, 2025
@esoteric-ephemera
Copy link
Collaborator Author

esoteric-ephemera commented Apr 24, 2025

@JaGeo would you mind taking a quick look over the docs changes and see if that's what you had in mind for #1151 and #1152?

The openff tests seem to be OK now with the newer dependency stack, but I marked a few places with TODOs that will have to eventually be updated. Whenever you get a chance @orionarcher, would be super helpful if you could look these over

@JaGeo
Copy link
Member

JaGeo commented Apr 24, 2025

Thank you @esoteric-ephemera ! Those updates are great! (as usual).

@janosh janosh added docs Improvements or additions to documentation vasp Vienna Ab initio Simulation Package new wf PRs with new workflows labels Apr 24, 2025
@esoteric-ephemera esoteric-ephemera merged commit 0b66db3 into materialsproject:main Apr 24, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation new wf PRs with new workflows vasp Vienna Ab initio Simulation Package
Projects
None yet
4 participants