Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new API support for nextgen Materials Project #692

Merged
merged 21 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ wheels/
*.egg-info/
.installed.cfg
*.egg
.vscode
.github

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -99,6 +101,7 @@ celerybeat-schedule
.venv
venv/
ENV/
env/

# Spyder project settings
.spyderproject
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ spktrain experiment=qm9_atomwise run.data_dir=<path> model/representation=painn
```

For more details on config groups, have a look at the
[Hydra docs](https://hydra.cc/docs/next/tutorials/basic/your_first_app/config_groups).
[Hydra docs](https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/).


### Example 2: Potential energy surfaces
Expand Down
Binary file added builder.docx
stefaanhessmann marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/getstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@ All values of the config can be changed from the command line, including the dir
By default, the model is stored in a directory with a unique run id hash as a subdirectory of ``spk_workdir/runs``.
This can be changed as follows::

$ spktrain experiment=qm9 run.data_dir=/my/data/dir run.path=~/all_my_runs run.id=this_run
$ spktrain experiment=qm9_atomwise run.data_dir=/my/data/dir run.path=~/all_my_runs run.id=this_run

If you call ``spktrain experiment=qm9 --help``, you can see the full config with all the parameters
If you call ``spktrain experiment=qm9_atomwise --help``, you can see the full config with all the parameters
that can be changed.
Nested parameters can be changed as follows::

Expand Down Expand Up @@ -114,7 +114,7 @@ If you would want to additionally change some value of this group, you could use
$ spktrain experiment=qm9_atomwise data_dir=<path> model/representation=painn model.representation.n_interactions=5

For more details on config groups, have a look at the
`Hydra docs <https://hydra.cc/docs/next/tutorials/basic/your_first_app/config_groups>`_.
`Hydra docs <https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/>`_.


Example 2: Potential energy surfaces
Expand Down
58 changes: 29 additions & 29 deletions examples/tutorials/tutorial_01_preparing_data.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is changed in this notebook? Is it on purpose?

Copy link
Collaborator Author

@sundusaijaz sundusaijaz Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, It was not mine

Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -348,18 +348,21 @@
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"To get a better initialization of the network and avoid numerical issues, we often want to make use of simple statistics of our target properties. The most simple approach is to subtract the mean value of our target property from the labels before training such that the neural networks only have to learn the difference from the mean prediction. A more sophisticated approach is to use so-called atomic reference values that provide basic statistics of our target property based on the atom types in a structure. This is especially useful for extensive properties such as the energy, where the single atom energies contribute a major part to the overall value. If your data comes with atomic reference values, you can add them to the metadata of your `ase` database. The statistics have to be stored in a dictionary with the property names as keys and the atomic reference values as lists where the list indices match the atomic numbers. For further explanation please have a look at the [QM9 tutorial](https://schnetpack.readthedocs.io/en/latest/tutorials/tutorial_02_qm9.html).\n",
"\n",
"Here is an example:"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# calculate this at the same level of theory as your data\n",
Expand All @@ -376,19 +379,16 @@
"# property_unit_dict={'energy':'kcal/mol'},\n",
"# atomref=atomref\n",
"# )"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"source": [
"In our concrete case, we only have an MD trajectory of a single system. Therefore, we don't need to specify an atomref, since removing the average energy will working as well."
],
"metadata": {
"collapsed": false
}
},
"source": [
"In our concrete case, we only have an MD trajectory of a single system. Therefore, we don't need to specify an atomref, since removing the average energy will working as well."
]
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -447,17 +447,21 @@
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Using your data for training\n",
"We have now used the class `ASEAtomsData` to create a new `ase` database for our custom data. `schnetpack.data.ASEAtomsData` is a subclass of `pytorch.data.Dataset` and could be utilized for training models with `pytorch`. However, we use `pytorch-lightning` to conveniently handle the training procedure for us. This requires us to wrap the dataset in a [LightningDataModule](https://lightning.ai/docs/pytorch/stable/data/datamodule.html). We provide a general purpose `AtomsDataModule` for atomic systems in `schnetpack.data.datamodule.AtomsDataModule`. The data module will handle the unit conversion, splitting, batching and the preprocessing of the data with `transforms`. We can instantiate the data module for our custom dataset with:"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"is_executing": true
},
"outputs": [],
"source": [
"import schnetpack as spk\n",
Expand All @@ -480,27 +484,23 @@
")\n",
"custom_data.prepare_data()\n",
"custom_data.setup()"
],
"metadata": {
"collapsed": false,
"is_executing": true
}
]
},
{
"cell_type": "markdown",
"source": [
"Please note that for the general case it makes sense to use your dataset within command line interface (see: [here](https://schnetpack.readthedocs.io/en/latest/userguide/configs.html)). For some benchmark datasets we provide data modules with download functions and more utilities in `schnetpack.data.datasets`. Further examples on how to use the data modules is provided in the following sections.\n"
],
"metadata": {
"collapsed": false
}
},
"source": [
"Please note that for the general case it makes sense to use your dataset within command line interface (see: [here](https://schnetpack.readthedocs.io/en/latest/userguide/configs.html)). For some benchmark datasets we provide data modules with download functions and more utilities in `schnetpack.data.datasets`. Further examples on how to use the data modules is provided in the following sections.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:spkdev] *",
"display_name": "Python 3",
"language": "python",
"name": "conda-env-spkdev-py"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -512,7 +512,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.11"
"version": "3.12.0"
},
"nbsphinx": {
"execute": "never"
Expand Down
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ authors = [
description = "SchNetPack - Deep Neural Networks for Atomistic Systems"
readme = "README.md"
license = { file="LICENSE" }
requires-python = ">=3.10"
requires-python = "==3.12"
dependencies = [
"numpy>=2.0.0",
"sympy<=1.12",
"sympy>=1.13",
"ase>=3.21",
"h5py",
"pyyaml",
"hydra-core>=1.1.0",
"torch>=1.9",
"torch>=2.5.0",
"pytorch_lightning>=2.0.0",
"torchmetrics",
"hydra-colorlog>=1.1.0",
Expand All @@ -41,7 +41,8 @@ dependencies = [
"tqdm",
"pre-commit",
"black",
"protobuf"
"protobuf",
"progressbar"
]

[project.optional-dependencies]
Expand Down
9 changes: 9 additions & 0 deletions src/schnetpack/configs/data/qm7x.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
defaults:
- custom

_target_: schnetpack.datasets.QM7X

datapath: ${run.data_dir}/qm7x.db # data_dir is specified in train.yaml
batch_size: 100
num_train: 5550
num_val: 700
45 changes: 35 additions & 10 deletions src/schnetpack/data/atoms.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ def add_systems(
self,
property_list: List[Dict[str, Any]],
atoms_list: Optional[List[Atoms]] = None,
key_value_list: Optional[List[Dict[str, Any]]] = None,
):
pass

Expand Down Expand Up @@ -463,6 +464,7 @@ def add_systems(
self,
property_list: List[Dict[str, Any]],
atoms_list: Optional[List[Atoms]] = None,
key_value_list: Optional[List[Dict[str, Any]]] = None,
):
"""
Add atoms data to the dataset.
Expand All @@ -475,14 +477,31 @@ def add_systems(
order as corresponding list of `atoms`.
Keys have to match the `available_properties` of the dataset
plus additional structure properties, if atoms is None.
key_value_list: Properties as list of key-value pairs in the same
order as corresponding list of `atoms`.
Keys have to match the `available_properties` of the dataset
plus additional structure properties, if atoms is None.
"""
if atoms_list is None:
atoms_list = [None] * len(property_list)

for at, prop in zip(atoms_list, property_list):
self._add_system(self.conn, at, **prop)
# for at, prop in zip(atoms_list, property_list):
# self._add_system(self.conn, at, **prop)
for at, prop, key_val in zip(atoms_list, property_list, key_value_list):
self._add_system(
self.conn,
at,
key_val,
**prop,
)

def _add_system(self, conn, atoms: Optional[Atoms] = None, **properties):
def _add_system(
self,
conn,
atoms: Optional[Atoms] = None,
key_val: Optional[Dict[str, Any]] = None,
**properties,
):
"""Add systems to DB"""
if atoms is None:
try:
Expand All @@ -499,12 +518,7 @@ def _add_system(self, conn, atoms: Optional[Atoms] = None, **properties):
# add available properties to database
valid_props = set().union(
conn.metadata["_property_unit_dict"].keys(),
[
structure.Z,
structure.R,
structure.cell,
structure.pbc,
],
[structure.Z, structure.R, structure.cell, structure.pbc],
)
for prop in properties:
if prop not in valid_props:
Expand All @@ -514,11 +528,22 @@ def _add_system(self, conn, atoms: Optional[Atoms] = None, **properties):
+ f"provided together with its unit when calling "
+ f"AseAtomsData.create()."
)
for key in key_val:
if key not in valid_props:
logger.warning(
f"Property `{key}` is not a defined property for this dataset and "
+ f"will be ignored. If it should be included, it has to be "
+ f"provided together with its unit when calling "
+ f"AseAtomsData.create()."
)

data = {}
for pname in conn.metadata["_property_unit_dict"].keys():
try:
data[pname] = properties[pname]
if pname in properties:
data[pname] = properties[pname]
if pname in key_val:
data[pname] = key_val[pname]
except:
raise AtomsDataError("Required property missing:" + pname)

Expand Down
4 changes: 2 additions & 2 deletions src/schnetpack/data/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from typing import Optional, Sequence
from torch.utils.data import Dataset, Sampler
from torch.utils.data.dataloader import _collate_fn_t, T_co
from torch.utils.data.dataloader import _collate_fn_t, _T_co

import schnetpack.properties as structure

Expand Down Expand Up @@ -63,7 +63,7 @@ class AtomsLoader(DataLoader):

def __init__(
self,
dataset: Dataset[T_co],
dataset: Dataset[_T_co],
batch_size: Optional[int] = 1,
shuffle: bool = False,
sampler: Optional[Sampler[int]] = None,
Expand Down
2 changes: 1 addition & 1 deletion src/schnetpack/data/splitting.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import torch
import numpy as np

__all__ = ["SplittingStrategy", "RandomSplit", "SubsamplePartitions"]
__all__ = ["SplittingStrategy", "RandomSplit", "SubsamplePartitions", "GroupSplit"]


def absolute_split_sizes(dsize: int, split_sizes: List[int]) -> List[int]:
Expand Down
1 change: 1 addition & 0 deletions src/schnetpack/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
from .materials_project import *
from .omdb import *
from .tmqm import *
from .qm7x import *
Loading