Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can GraphRole be used on large networks? #6

Open
rjurney opened this issue Aug 31, 2021 · 8 comments
Open

Can GraphRole be used on large networks? #6

rjurney opened this issue Aug 31, 2021 · 8 comments

Comments

@rjurney
Copy link
Contributor

rjurney commented Aug 31, 2021

We are interested in using this on a billion node network. How well does it scale to large graphs? We can partition our network if required, but we don't know if this is a multi-core implementation via networkx or if this is something not likely to scale beyond small networks.

@dkaslovsky
Copy link
Owner

Hi @rjurney, thanks for your interest in using GraphRole. This package hasn't been tested at the scale you mention and part of the implementation uses Pandas which might have problems at this scale.

One thing to note though is that GraphRole is not dependent on any particular graph library, so it can be integrated with any scalable graph library of your choice. All that needs to be done is to satisfy the required interface and make it discoverable. The steps are:

  1. Subclass the BaseGraphInterface class in graphrole.graph.interface.base.py and implement the required methods
  2. Update the INTERFACES dict in graphrole.graph.interface.__init__.py to make the new subclass discoverable

See full instructions in the README for setting up tests if so desired.

I'd be very interested to know how it works out if you go down this route, please keep me posted!

@rjurney
Copy link
Contributor Author

rjurney commented Aug 31, 2021

@dkaslovsky thanks, this is really helpful. What you've done here is really cool and I am encouraging the Deep Discovery team to implement this using PySpark and GraphFrames and if we do we will contribute it back... but setting up testing and things may take some time. We'll do an intermediate PR to get things started. cc @ajs-dd

@dkaslovsky
Copy link
Owner

That's really exciting to hear. I've thought about adding a more scalable dataframe library in the past, so I'm really excited that you and your team might look into implementing and I'd be grateful for any contribution back to GraphRole. Please let me know if there's any help I can provide along the way!

@dkaslovsky
Copy link
Owner

Oh, one other thought I forgot to mention is that Dask might also be a good option to explore for distributed dataframe functionality.

@rjurney
Copy link
Contributor Author

rjurney commented Aug 31, 2021

@dkaslovsky yeah, but we have a 1.5 billion node business graph so we need it to work across multiple machines and have graph rather than just DataFrame abstractions. This is why GraphFrames is really nice. It is on Spark and uses DataFrames but has graph operations.

https://graphframes.github.io/graphframes/docs/_site/index.html

@dkaslovsky
Copy link
Owner

Ah, I see. A graphframes-based implementation sounds very appealing!

@rjurney
Copy link
Contributor Author

rjurney commented Mar 26, 2023

@dkaslovsky in which PR? How?

@dkaslovsky
Copy link
Owner

@rjurney Apologies, reopening, this was in error.

@dkaslovsky dkaslovsky reopened this Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants