Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does the --largest mean? #39

Open
hui-liu opened this issue Oct 16, 2020 · 1 comment
Open

What does the --largest mean? #39

hui-liu opened this issue Oct 16, 2020 · 1 comment
Labels

Comments

@hui-liu
Copy link

hui-liu commented Oct 16, 2020

Hi,

If I set the --largest to the default (100), I got more than 2000 modules and the size of all modules was smaller than or equal to 100. And if setting the --largest to the total number of genes, I just got ~30 modules, and the biggest one can be 20000 nodes. When I changed the value of --largest, the other parameters were the same.

Based on the manual, I thought the --largest means that the max size of output modules should smaller than the value I set, and those larger than the value should be discarded. However, as I mentioned above, it does more than just limit the module size of the output.

I am wondering why the default was 100 and if I don't want to limit the module size of output how should I set the parameters in MONET?

Thank you so much!

@sergio-gomez
Copy link
Contributor

Hi,

The default value 100 was a requirement of the Disease Module Identification (DMI) DREAM Challenge: modules with sizes below 3 or above 100 were discarded in the evaluation process, since they were supposed to contain no valuable information in the identification of modules related with diseases. Each algorithm uses a different strategy to find the modules and to fulfil these size restrictions. The behaviour you describe is perfectly normal in community detection when there is large heterogeneity in module sizes: without an upper bound on the size, you can have a few very large modules, and many more smaller modules. The upper bound breaks the large modules in smaller ones (probably, submodules of the primary modules).

I can explain a little bit more about my algorithm, M1 (the modularity-based one). It is based on the adjustment of the resolution at which you want to find the communities. The resolution parameter lets you tune the resistance of nodes to form communities. With a high value, you get many small communities; with a small value, the modules are few and large. To generate modules within the challenge bounds, we search for a reasonable value of the resistance parameter, such that most of the nodes are contained in modules with sizes inside the desired range. To avoid excessive fragmentation, first, we let modules larger than desired (say e.g., about 5*100=500 nodes), and then, we refine those modules with a subsequent additional fragmentation in smaller modules.

Hope this helps to understand the behaviour of MONET.

All the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants