Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adios dataset name 319 #320

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

kshitij-v-mehta
Copy link
Collaborator

@kshitij-v-mehta kshitij-v-mehta commented Jan 27, 2025

Update Feb 03: PR ready for merging.

PR for review only. Not ready for merging yet.

@@ -288,6 +288,7 @@ def __getitem__(self, idx):
data = generate_graphdata_from_smilestr(
smilestr, ytarget, csce_node_types, var_config
)
data.dataset_name = 'csce'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jychoi-hpc @allaffa Should the dataset_name be added this way manually, or should we read it from the json file?

@kshitij-v-mehta kshitij-v-mehta marked this pull request as ready for review January 28, 2025 15:36
Copy link
Member

@jychoi-hpc jychoi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. It looks good to me.


# Look for the dataset_name in any one of the Data samples and add it as an ADIOS attribute
dataset_name = self._get_dataset_name()
if dataset_name is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshitij-v-mehta
Why do we need this if the dataset_name attributed is already take care of in the parent class AbstractBaseDataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the ADIOS writer. It checks for the presence of the dataset_name in the object. If it exists, it adds it to the ADIOS file.
Similarly for the reader, if it finds dataset_name in the file, it reads it and adds it to the object. The get method of the parent class then reads this attribute.

@kshitij-v-mehta
Copy link
Collaborator Author

kshitij-v-mehta commented Feb 6, 2025

Can this PR be merged? The check is failing due to a formatting issue in
examples/multidataset_deepspeed/train.py, which is unrelated to this PR's code.

@kshitij-v-mehta
Copy link
Collaborator Author

@pzhanggit This PR adds the dataset name to the ADIOS file. Does it also need to be added to the Pickle output?

@pzhanggit
Copy link
Collaborator

@pzhanggit This PR adds the dataset name to the ADIOS file. Does it also need to be added to the Pickle output?

Thanks, Kshitij! I think so, as long as we continue to support Pickle. @allaffa what do you think?

@kshitij-v-mehta
Copy link
Collaborator Author

kshitij-v-mehta commented Feb 7, 2025

@pzhanggit This PR adds the dataset name to the ADIOS file. Does it also need to be added to the Pickle output?

Thanks, Kshitij! I think so, as long as we continue to support Pickle. @allaffa what do you think?

Never mind, I added it to the pickle output as well.

pzhanggit and others added 10 commits February 10, 2025 13:36
* add multiple branches to Base model

* adjust based on Max's new datasets with dataset_name

* add json for compute grad energy physics-informed force prediction

* revert time change

* reverse prior change

* Add check to ensure correct output dim for energy in physics-informed force prediction

* Implement the check after setting var_config

* revert change

* Shorten comment

---------

Co-authored-by: Rylie Weaver <[email protected]>
Co-authored-by: Rylie Weaver <[email protected]>
* data attributes updated for consistency across datasets

* non-normalized chemical composition added as data attribute

* download_dataset.sh added for transition1x example

* download dataset flag updated

* scripts updated

* development of tranistion1x scripts continues

* transiton1x scripts completed

* black formatting fixed

* printouts removed

* parallelizatin of data reading introduced

* blsck formatting fixed

* detach().clone() used to defined normalized energy per atom and black formatting

* add compute_grad_energy=False as explicit argument

* add data name as attributed to each data object

* compute_grad_energy is parsed as input argument with default value set to False

* edge_index, edge_attr, and edge_shifts explicitly itnroduced in the definition of the Data object

* changed data.force into data.forces for ani1x and qm7x examples

* smiles_string added as data attribute

* remove redundant logic on energy normalization from omat24 example

* force threshold value increased to 1000 for ani-1x

* Reverted smiles_utils.py to version from commit 3c3c434

* xyz2mol functionalities put in a separate file

* download dataset script added for qm7x

* renamed data.force as data.forces in ani1x

* natoms converted into a tensor

* verbosity level ntroduced for ani1x

* Z corrected into atomic_numbers for qm7x example

* bug fixed for data attributes in transition1x

* try-except in transition1x rescoped

* transform coordinates fixed in transition1x

* iterate_tqdm used in utils.create_graph_data for transition1x example

* total_enerfgy replaced with energy

---------

Co-authored-by: Massimiliano Lupo Pasini <[email protected]>
Co-authored-by: Massimiliano Lupo Pasini <[email protected]>
* Add multidataset example with deepspeed support

* Change base.json to follow GPS' requirement
allaffa and others added 24 commits February 11, 2025 15:00
…e and shape for each attribute, and appropriate positional transform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants