Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjacency matrix for only some vertices #2196

Open
szhorvat opened this issue Sep 11, 2022 · 8 comments · May be fixed by #2379
Open

Adjacency matrix for only some vertices #2196

szhorvat opened this issue Sep 11, 2022 · 8 comments · May be fixed by #2379
Assignees
Labels
wishlist Feature request that has not been chosen for implementation yet; vote or comment to prioritize it!
Milestone

Comments

@szhorvat
Copy link
Member

What is the feature or improvement you would like to see?

Currently there is igraph_get_adjacency(_sparse)() to construct the adjacency matrix of a graph. There is also igraph_get_incidence() for a bipartite incidence matrix.

I propose a generalization of both where the vertices corresponding to the matrix rows/column can be specified:

igraph_error_t igraph_get_adjaceny_submatrix(
        const igraph_t *graph, igraph_matrix_t *res,
        const igraph_vs_t rows, const igraph_vs_t cols,
        const igraph_vector_t *weights, igraph_loops_t loops);

This would generate the submatrix corresponding the specified row-vertices and column-vertices.

There should be a sparse version as well.

Use cases for the feature

  • Fast way to obtain the adjacency matrix of a small subgraph (when rows and cols are the same).
  • A single function handles both the adjacency matrix and bipartite incidence matrix cases.
  • Obtain connectivity between different blocks, e.g. two clusters out of many, obtained with a community detection algorithm.
  • There are matrix-based algorithms that must be applied per-connected-component, such as Seidel's shortest path algorithm. This is a convenient way to get a matrix for each component.

One of the purposes of this function would be to offer better performance and better memory usage than generating the entire adjacency matrix first then subsetting it.

References

@szhorvat szhorvat added this to the 0.11 milestone Sep 11, 2022
@szhorvat szhorvat added the wishlist Feature request that has not been chosen for implementation yet; vote or comment to prioritize it! label Sep 11, 2022
@GroteGnoom GroteGnoom self-assigned this Jul 22, 2023
@GroteGnoom
Copy link
Member

This has the same snag as subset betweenness: partially overlapping vertex selections for the undirected case need special care.
(see #2352 (comment))

@szhorvat
Copy link
Member Author

szhorvat commented Jul 23, 2023

Does it? I believe this is a simple matrix subsetting operation. I.e. the result should be the same as taking only some rows and only some columns in the adjacency matrix.

The fact that in the undirected partially overlapping case some connections will appear twice (below and above diagonal) and some once should not be a problem here.

Let me know if I misunderstood something.

@GroteGnoom
Copy link
Member

Doesn't that take an unnecessary amount of time? igraph_get_adjacency() loops through all edges to take their to and froms.

For the submatrix variant I think going through all edges and then check if their to and froms are in the vertex selectors is not the most efficient way of handling this. I want to go through the vertex selectors and get the edges between them. Then you hit this problem.

@szhorvat
Copy link
Member Author

Doesn't that take an unnecessary amount of time?

I didn't mean that it should be implemented by subsetting the complete matrix. That makes no sense, as there would be no performance gain in using this function. I simply meant to describe the expected result.

I want to go through the vertex selectors and get the edges between them. Then you hit this problem.

I see what you mean.


How about this?

  • Create a bool array is_target where is_target[v] is true is v is a to vertex
  • For each from vertex s:
    • Get the neighbour list, and for each neighbour t:
      • If is_target[t] then matrix[s, t] += 1.

I think this gives the correct result both for directed and undirected graphs. However, its performance characteristics are not invariant to the exchange of to and from. I'm not sure if this is an issue in practice. If we wanted to try to squeeze more performance, we have two version, one where the outer loop is on from and one where it is on to. Then choose the version where the outer loop is shorter.


An alternative implementation will not construct is_target. Instead, it creates a sorted list of to vertex IDs. Then it takes the intersection of this sorted list with the neighbour list. Note that the neighbour list is guaranteed to be sorted.

We already have such sorted intersection functions,

  • igraph_i_neisets_intersect in cocitation.c
  • In the cycle basis calculation I computed a symmetric difference in a similar way

Maybe we should have helper functions for union, intersection, difference, symmetric difference for sorted integer lists.


I suspect that one of these two approaches should be faster than starting with a source and target vertex ID, then getting the edges inbetween.

But this is difficult to predict ...

What do you think @GroteGnoom ?

@szhorvat
Copy link
Member Author

These are the two common ways to intersect vertex sets, and I'm never sure which is faster, or under what conditions each is faster. E.g. igraph_ecc() uses the sort method and igraph_i_local_relative_density() uses the Boolean mask method.

GroteGnoom added a commit to GroteGnoom/igraph that referenced this issue Jul 29, 2023
@GroteGnoom
Copy link
Member

How about this?

  • Create a bool array is_target where is_target[v] is true is v is a to vertex
  • For each from vertex s:
    • Get the neighbour list, and for each neighbour t:
      • If is_target[t] then matrix[s, t] += 1.

Lets use an example to clear things up. I think the simplest case where we have problems is a 3-vertex full graph with no loops, from = [0,1,2], to = [0,1].

The full GET_ADJACECNY_BOTH adjacency matrix for the whole graph would look like this:

0 1 1
1 0 1
1 1 0

we have 3 selected edges:
0-1
0-2
1-2

First, GET_ADJACENCY_LOWER and GET_ADJACENCY_UPPER don't always make sense. The 'from 0 to 2' and 'from 1 to 2' elements don't exist in the submatrix, so GET_ADJACENCY_UPPER will miss edges.

GET_ADJACENCY_LOWER would look like this:

0 0
1 0
1 1

GET_ADJACENCY_UPPER would look like this:

0 1
0 0
0 0

And this is GET_ADJACENCY_BOTH

0 1
1 0
1 1

... I'll have to think a bit.

@szhorvat
Copy link
Member Author

For this application, there should be no igraph_get_adjacency_t type parameter. As you point out, it makes no sense.

@GroteGnoom GroteGnoom linked a pull request Jul 29, 2023 that will close this issue
@GroteGnoom
Copy link
Member

GroteGnoom commented Jul 29, 2023

I think that just solves everything 🙂
There could be performance improvements, including your suggestion.

I think this also shows the betweenness_subset problem in matrix form: neither the upper triangle or lower triangle necessarily has all the selected edges, and the whole matrix sometimes has an edge twice, sometimes once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wishlist Feature request that has not been chosen for implementation yet; vote or comment to prioritize it!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants