This DGL example implements the model proposed in the paper InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization.
Author's code: https://github.com/fanyun-sun/InfoGraph
This example was implemented by Hengrui Zhang when he was an applied scientist intern at AWS Shanghai AI Lab.
- Python 3.7
- PyTorch 1.7.1
- dgl 0.6.0
'MUTAG', 'PTC', 'IMDBBINARY'(IMDB-B), 'IMDBMULTI'(IMDB-M), 'REDDITBINARY'(RDT-B), 'REDDITMULTI5K'(RDT-M5K) of dgl.data.GINDataset.
Dataset | MUTAG | PTC | RDT-B | RDT-M5K | IMDB-B | IMDB-M |
---|---|---|---|---|---|---|
# Graphs | 188 | 344 | 2000 | 4999 | 1000 | 1500 |
# Classes | 2 | 2 | 2 | 5 | 2 | 3 |
Avg. Graph Size | 17.93 | 14.29 | 429.63 | 508.52 | 19.77 | 13.00 |
Semi-supervised Graph Regression Dataset:
QM9 dataset for graph property prediction (regression)
Dataset | # Graphs | # Regression Tasks |
---|---|---|
QM9 | 130,831 | 12 |
The 12 tasks are:
Keys | Description |
---|---|
mu | Dipole moment |
alpha | Isotropic polarizability |
homo | Highest occupied molecular orbital energ |
lumo | Lowest unoccupied molecular orbital energy |
gap | Gap between 'homo' and 'lumo' |
r2 | Electronic spatial extent |
zpve | Zero point vibrational energy |
U0 | Internal energy at 0K |
U | Internal energy at 298.15K |
H | Enthalpy at 298.15K |
G | Free energy at 298.15K |
Cv | Heat capavity at 298.15K |
--dataname str The graph dataset name. Default is 'MUTAG'.
--gpu int GPU index. Default is -1, using CPU.
--epochs int Number of training periods. Default is 20.
--batch_size int Size of a training batch. Default is 128.
--lr float Adam optimizer learning rate. Default is 0.01.
--log_interval int Interval bettwen two evaluations. Default is 1.
--n_layers int Number of GIN layers. Default is 3.
--hid_dim int Dimension of hidden layers. Default is 32.
--target str The regression Task. Default is 'mu'.
--train_num int Number of supervised examples. Default is 5000.
--gpu int GPU index. Default is -1, using CPU.
--epochs int Number of training periods. Default is 200.
--batch_size int Size of a training batch. Default is 20.
--val_batch_size int Size of a validation batch. Default is 100.
--lr float Adam optimizer learning rate. Default is 0.001.
--hid_dim int Dimension of hidden layers. Default is 64.
--reg int Regularization weight. Default is 0.001.
Training and testing unsupervised model on MUTAG.
(As graphs in these datasets are quite small and sparse, moving graphs from cpu to gpu would take a longer time than training, we recommend using cpu for these datasets).
# MUTAG:
python unsupervised.py --dataname MUTAG --n_layers 4 --hid_dim 32
Replace 'MUTAG' with dataname in ['MUTAG', 'PTC', 'IMDBBINARY', 'IMDBMULTI', 'REDDITBINARY', 'REDDITMULTI5K'] if you'd like to try other datasets.
Training and testing semi-supervised model on QM9 for graph property 'mu' with gpu.
# QM9:
python semisupervised.py --gpu 0 --target mu
Replace 'mu' with other target names above.
The hyperparameter setting in our implementation is identical to that reported in the paper.
Dataset | MUTAG | PTC | RDT-B | RDT-M5K | IMDB-B | IMDB-M |
---|---|---|---|---|---|---|
Accuracy Reported | 89.01 | 61.65 | 82.50 | 53.46 | 73.03 | 49.69 |
DGL | 89.88 | 63.54 | 88.50 | 56.27 | 72.70 | 50.13 |
- REDDIT-M dataset would take a quite long time to load and evaluate.
Here we only provide the results of 'mu', 'alpha', 'homo'.
Target | mu | alpha | homo |
---|---|---|---|
MAE Reported | 0.3169 | 0.5444 | 0.0060 |
The authors' code | 0.2411 | 0.5192 | 0.1560 |
DGL | 0.2355 | 0.5483 | 0.1581 |
- The source of QM9 Dataset has changed so there's a gap between the MAE reported in the paper and that we reprodcued.
- See this issue for authors' response.