Implementation of Graph Representation Learning Network via Adaptive Sampling: http://arxiv.org/abs/2006.04637
The algorithm represents nodes by reducing their neighbour representations with attention. Multi-step neighbour representations incorporate different path properties. Neighbours are sampled using learnable depth coefficients.
This repository is organized as follows:
data/
contains the necessary files for the Cora, Pubmed, Citeseer, PPI, Twitter and YouTube datasets.framework/
contains helper libraries for model development, training and evaluation.gatas/
contains the implementation of GATAS.node_classification/
contains a node label classifier using the model.link_prediction/
contains a link predictor using the model.
First we must create a CSR binary representation of the graph where the values are the edge type indices. For the Cora, Citeseer and PubMed datasets, we have precomputed and placed them in data/
. For PPI, it can be computed with:
python3 -m node_classification.datasets.ppi --path data/ppi
For the Twitter and YouTube datasets, it can be computed with:
python3 -m link_prediction.datasets.gatne --path {path to dataset} --num-edge-types {number of edge types}
These scripts will also collect and preprocess the node features when available, and create dataset splits with inputs and targets for the tasks. Once we have a CSR graph representation, we can compute the transition probabilities by running:
python3 -m gatas.transitions --path {path to dataset} --num_steps {number of steps}
GATAS has two components: NeighbourSampler
and NeighbourAggregator
. NeighbourSampler
can be initialized with a path so the precomputed transition data can be used:
from gatas.sampler import NeighbourSampler
neighbour_sampler = NeighbourSampler.from_path(num_steps=3, path='data/ppi')
NeighbourAggregator
can receive a matrix of node features and can be initialized as follows:
import numpy as np
from gatas.aggregator import NeighbourAggregator
node_features = np.load('data/ppi/node_embeddings.npy')
neighbour_aggregator = NeighbourAggregator(
input_noise_rate=0.,
dropout_rate=0.,
num_nodes=node_features.shape[0],
num_edge_types=neighbour_sampler.num_edge_types,
num_steps=3,
edge_type_embedding_size=5,
node_embedding_size=None,
layer_size=256,
num_attention_heads=10,
node_features=node_features,
)
We can call neighbour_aggregator
with the output of neighbour_sampler
. This pattern is used in the node classification and link prediction tasks. You can train those models with:
python3 -m node_classification.train --data-path {path to dataset}
or:
python3 -m link_prediction.train --data-path {path to dataset}
where additional parameters can be passed through the command line. Run with --help
for a list of them:
python3 -m node_classification.train --help
@misc{andrade2020graph,
title={Graph Representation Learning Network via Adaptive Sampling},
author={Anderson de Andrade and Chen Liu},
year={2020},
eprint={2006.04637},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MIT