Skip to content

Feature/virtual nodes #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 146 commits into
base: main
Choose a base branch
from
Open

Feature/virtual nodes #26

wants to merge 146 commits into from

Conversation

sidnb13
Copy link
Member

@sidnb13 sidnb13 commented May 22, 2023

Summary

This is a very large PR, and contains all of the current work involving virtual-node GNNs, in addition to a variety of improvements and features to the preprocessing and config management pipeline. Below are some more details on the features integrated. These may not be comprehensive, so I'll try to update/explain more based on feedback. Some main issues to be further discussed:

  • Updating example configs to reflect the new developments, and creating documentation to match.
  • Ensuring that the new features don't break existing pipelines (i.e. backward compatibility).

General

  • main.py entrypoint conceptual changes. We now have support for WandB sweeps. MatDeepLearn can now serve 1) original functionality, in which a config file is specified for a task 2) entrypoint for an automated sweep, in which the config is read from WandB rather than a file, and the task is self-contained 3) as a way to generate sweeps given a sweep configuration. This integrates with jobs (discussed next) and rather than immediately training, generates the commands needed to launch asynchronous sweeps. Sweeps can be run sequentially or in parallel (feature is still being ironed out) but parallel is recommended since runs are prone to failure.
  • The concept of jobs, which are either ran locally or via Slurm, and can be integrated into the registry. Each job type creates a custom entrypoint, which is either a command or a batch file to be ran by the user. MatDeepLearn can operate in "job-script" mode where it generates these entry points to be ran by the user. Useful for iterative experiments. Integrated with sweeps, as discussed above.
  • New custom dataclasses derived from torch_geometric.data.Data objects. Implements custom batching routines in order to work with some of the new attributes introduced by virtual nodes.
  • More utilities which implement complicated dictionary merging features and profiling methods.

Models

  • Add two new models to the registry (CGCNN-VN and CGCNN-VN-HG, the heterogeneous variant).
  • Addition of TorchMD-net and gemnet-oc
  • A new routines concept, which is a way to organize model-augmenting sub-features such as custom pooling methods.
  • A new layers concept, which is a way to organize pieces of network architectures that do not function as standalone components.

Preprocessing

  • Batch processing. This is done via taking advantage of PyG batching internals. We specify the batch size in config, process batches of data, then convert the processed data back to individual examples (Note: will be helpful to document the general approach to doing this, since it is a bit involved implementation-wise).
  • Batch processing capable transforms. We can specify this in config, and non-supported transforms return appropriate errors. Each transform must choose whether or not to implement this.
  • Transforms can now share a common set of arguments with preprocessing, this is done via the config file.
  • Addition of new transforms for the virtual node feature.
  • Additional helper methods.

@sidnb13
Copy link
Member Author

sidnb13 commented Jun 1, 2023

Some more notes:

  • The task configuration has a separate section for wandb configuration. See an example config file for more details. This allows only a subset of hyperparameters to be tracked to prevent clutter, and allows for sweep configuration. Sweeps (feature still pending more testing) are disabled by default, I plan to add more documentation on how they work for future use.
  • Model hyperparameters are under model.hyperparams in config rather than just model. This nested object is directly passed to the model. Done to improve organization. This requires existing configs to be modified.

@sidnb13 sidnb13 added the enhancement New feature or request label Jun 7, 2023
@sidnb13
Copy link
Member Author

sidnb13 commented Jun 9, 2023

Fixed a couple of more minor config-related incompatibility bugs. Also in case of single-device training, resolved ambiguity provided by rank. The _forward implementation only uses rank if distributed training is enabled, else allocates the device based on config (using min_alloc_gpu as fallback).

Implemented (WIP) a model for simpler attention mechanism based pooling. Encodes a bit into the node features to indicate virtual (0) or real (1) and performs self-attention graph pooling. For now we do global, but would be straightforward to implement a hierarchical approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant