Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for jaccard_coefficient #62

Merged
merged 12 commits into from
Jan 30, 2025
14 changes: 1 addition & 13 deletions nx_cugraph/algorithms/link_prediction.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,7 @@ def jaccard_coefficient(G, ebunch=None):
# for invalid node IDs in ebunch.
u_indices = G._list_to_nodearray(u)
v_indices = G._list_to_nodearray(v)
except KeyError as n:
raise nx.NodeNotFound(f"Node {n} not in G.")

# If G was not renumbered, then the ebunch nodes must be explicitly
# checked. If not done, plc.jaccard_coefficients() will accept node IDs
# not in the graph and return a coefficient of 0 for them, which is not
# compatible with NX.
if (not hasattr(G, "key_to_id") or G.key_to_id is None) and (
(n := u_indices.max()) >= G._N
or (n := v_indices.max()) >= G._N
or (n := u_indices.min()) < 0
or (n := v_indices.min()) < 0
):
except (KeyError, ValueError) as n:
raise nx.NodeNotFound(f"Node {n} not in G.")

(u, v, p) = plc.jaccard_coefficients(
Expand Down
4 changes: 4 additions & 0 deletions nx_cugraph/classes/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -1152,6 +1152,10 @@ def _nodearray_to_list(self, node_ids: cp.ndarray[IndexValue]) -> list[NodeKey]:
def _list_to_nodearray(self, nodes: list[NodeKey]) -> cp.ndarray[IndexValue]:
if (key_to_id := self.key_to_id) is not None:
nodes = [key_to_id[node] for node in nodes]
else:
valids = [isinstance(n, int) and n >= 0 and n < self._N for n in nodes]
if not all(valids):
raise ValueError(nodes[valids.index(False)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, right, values like 4.5 are also invalid.

There are many types of integers floating around so isinstance(n, int) may be inadequate, and isinstance(n, numbers.Integral) is slow, so here's an alternative

N = self._N
for node in nodes:
    try:
        node = index(node)  # Ensure integral value
    except TypeError:
        raise KeyError(node) from None
    if node < 0 or node >= N:
        raise KeyError(node)

where we import index from operators.

This is a little more strict than NetworkX

In [2]: G = nx.complete_graph(10)

In [3]: 1 in G
Out[3]: True

In [4]: 1.0 in G
Out[4]: True

so another alternative could be

N = self._N
for node in nodes:
    try:
        n = int(node)
    except TypeError:
        raise KeyError(node) from None
    if n != node or n < 0 or n >= N:
        raise KeyError(node)

Also, what do you think of making this verification opt-in by adding a keyword argument to the method? I'm conflicted, b/c I like performance. We have sometimes played a little fast and loose in the name of performance, and we may not catch every invalid node that NetworkX would catch. This has been somewhat intentional: functions in networkx don't have consistent behavior, so I would want to have e.g. randomized tests like Ross talked about to stress both networkx and networkx backends for exceptional behavior. OTOH, it's probably safest to do this check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what do you think of making this verification opt-in by adding a keyword argument to the method?

I like that idea. It seems like many of our functions could take an argument that essentially says "skip input verification because I know they're valid" for use cases optimized for performance. This seems like a good topic for the dispatching meeting, even though what you're proposing is or could be specific to nx-cugraph's kwargs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this feature request for the opt-in suggestion above: #70

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought was to keep this an internal option--i.e., keyword--to _list_to_nodearray (and maybe others) to control whether to do this check. E.g., some networkx algorithms may or may not raise.

return cp.array(nodes, dtype=index_dtype)

def _nodearray_to_set(self, node_ids: cp.ndarray[IndexValue]) -> set[NodeKey]:
Expand Down
14 changes: 14 additions & 0 deletions nx_cugraph/tests/test_link_prediction.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,17 @@ def test_no_nonexistent_edges_no_ebunch():
result = nx.jaccard_coefficient(G)
assert isinstance(result, Iterable)
assert pytest.raises(StopIteration, next, result)


def test_node_not_found_in_ebunch():
"""Test that all nodes in ebunch are valid

Ensure function raises NodeNotFound for invalid nodes in ebunch.
"""
G = nx.Graph([(0, 1), (1, 2)])
with pytest.raises(nx.NodeNotFound, match="Node A not in G."):
nx.jaccard_coefficient(G, [("A", 1)])
with pytest.raises(nx.NodeNotFound, match=r"Node \(1,\) not in G."):
nx.jaccard_coefficient(G, [(0, (1,))])
with pytest.raises(nx.NodeNotFound, match="Node 9999 not in G."):
nx.jaccard_coefficient(G, [(0, 9999)])
Loading