Local Graph Clustering provides
- methods that find local clusters in a given graph without touching the whole graph
- methods that improve a given cluster
- methods for global graph partitioning
- tools to compute Network Community Profiles
- scalable graph analytics on your laptop
The current version is 0.5.0 and it is appropriate for experts and intermediates. Contact information for any questions and feedback is given below.
- Kimon Fountoulakis, email: kimon.fountoulakis at uwaterloo dot ca
- Meng Liu, email: liu1740 at purdue dot edu
- David Gleich, email: dgleich at purdue dot edu
- Michael Mahoney, email: mmahoney at stat dot berkeley dot edu
- Chufeng Hu, email: chufeng dot hu at uwaterloo dot ca
- Yuying Li, email: yuying at uwaterloo dot ca
- Ben Johnson, email: bkj dot 322 at gmail dot com
- Approximate PageRank
- L1-regularized PageRank (solved using accelerated proximal gradient descent)
- PageRank Nibble
- Rounding methods for spectral embeddings
- MQI
- FlowImprove
- SimpleLocal
- Capacity Releasing Diffusion
- Multiclass label prediction
- Network Community Profiles
- Global spectral partitioning
- Find k clusters using local graph clustering (uses local graph clustering to do graph partitioning, local to global graph clustering)
- Graph partitioning using local graph clustering
- Image segmentation using local graph clustering
- Densest subgraph
- Triangle clusters and vertex neighborhood metrics
- Handy network drawing methods
- Network Community Profiles
- Multiclass label prediction
- Triangle clusters and vertex neighborhood metrics
- Find k clusters (local to global graph clustering)
- Image segmentation
- Image segmentation using gPb
- Find small clusters in image using conductance
- p-Norm Flow Diffusion for Local Graph Clustering
- Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance (experiments conducted on this branch)
- A Short Introduction to Local Graph Clustering Methods and Software
All examples are in the notebooks folder.
Below is a simple demonstration from test.py in notebooks on how to improve spectral partitioning using flow-based methods from local graph clustering.
from localgraphclustering import *
import time
import numpy as np
# Read graph. This also supports gml and graphml format.
g = GraphLocal('./datasets/senate.edgelist','edgelist',' ')
# Call the global spectral partitioning algorithm.
eig2 = fiedler(g)
# Round the eigenvector
output_sc = sweep_cut(g,eig2)
# Extract the partition for g and store it.
eig2_rounded = output_sc[0]
# Conductance before improvement
print("Conductance before improvement:",g.compute_conductance(eig2_rounded))
# Start calling SimpleLocal
start = time.time()
output_SL_fast = SimpleLocal(g,eig2_rounded)
end = time.time()
print("running time:",str(end-start)+"s")
# Conductance after improvement
print("Conductance after improvement:",g.compute_conductance(output_SL_fast[0]))
output_SL = output_SL_fast[0]
For general examples with visualization using our built-in drawing methods, see the Jupyter notebook examples with visualization.
For comparisons of spectral- and flow-based methods with visualization see the Jupyter notebooks here and here.
For visual demonstration of algorithms that can improve a given seed set of nodes see the Jupyter notebook here.
For examples using reasonably large graphs (100 million edges) on a 16GB RAM laptop please see the Jupyter notebook here.
For advanced examples see the Jupyter notebook here.
In theory and in practice we have observed that the performance of local graph clustering methods depends on the magnitute of the conductance of the target cluster as well as the magnitute of the minimum conductance in the induced subgraph of the target cluster. Simply put, if the "internal connectivity" of the target cluster (the minimum conductance in the induced subgraph of the target cluster) is not stronger than the "external connectivity" (the conductance of the target cluster) then local graph clustering methods have poor performance in terms of finding the target cluster. For theoretical details please see Section 3 in the Capacity Releasing Diffusion for Speed and Locality paper. For extensive numerical experiments that demonstrate properties of challenging target clusters please see Section 4 in Capacity Releasing Diffusion for Speed and Locality as well as the supplementary material in the same link.
Clone the repo
Enter the folder using the termimal
Type in the terminal `python setup.py install`
Note that this package runs only with Python 3 on Mac or Linux.
-
In Julia, add the PyCall package:
Pkg.add("PyCall")
-
Update which version of Python that PyCall defaults to:
ENV["PYTHON"] = (path to python3 executable)
Pkg.build("PyCall")
(You can get the path to the python3 executable by just running "which python3" in the terminal.)
-
Make sure the PyPlot package is added in Julia.
-
Import localgraphclustering by using:
using PyPlot
using PyCall
@pyimport localgraphclustering
You can now use any routine in localgraphluserting from Julia.
MIT License
Copyright (C) 2020 Kimon Fountoulakis, Meng Liu, David Gleich and Michael W. Mahoney.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.