-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R "killed" or cores not returning data #136
Comments
If I adjust the divnet() parameters to the following, I can get divnet() to work for datasets up to dim(27, 2082). It is worth noting that increasing ncores will cause a failure.
I tried drastically decreasing iterations to see what happens, and the datasets with dim(27, >5,000) continue to fail.
I will implement divnet-rs and report back. |
Were you ever able to make any progress with this? |
@mooreryan I currently have it DivNet working in R and have done benchmarking. Something about my data causes relatively small sets to fail due to memory limits. We have decided not to use the rust implementation for now in order to keep the pipeline friendly for bioinformaticians/comp. sci. users. |
Could you posts the benchmarks, and the sizes of the data you're using that causes failure? |
I am attempting to use DivNet and can't get divnet() to work when my tables are clustered to a higher percentage similarity. I have attempted clustering to various percentage similarities; 80% similarity results in a matrix with dim(27, 2082).
I have attempted using nproc = 1, 4, 8, 16, and 32 cores. At one and 4 cores, I tend to get the above result with "killed". At 8 through 32 cores there is usually an error message about x, y, z cores not returning data, and it will affect the final calculations. The exact values for x, y, and z will change over time, and usually represent 25%-50% of the allotted cores. I am currently running this on our computing cluster but will be testing the same objects on our workstation PC while I await your reply. The latest specs for the cluster node: 32 3.7 GHz cores, 512GB of memory (8 x 64 GB RDIMM), 10gigabit network card, 480 GB SSD.
If I cluster 20% similarity, dim(27, 63), then everything runs as expected.
To add additional context, I have to physically pick the base OTU since there is no common base. My target is a functional gene, and the samples contain different types of environments. Therefore, one of the best base OTU is present in only 56% of the samples and a mean of 74 observations per sample !=0. I had planned to run the calculations on several bases to see how it influences the results. because visual inspection of my data shows OTUs tend to be present in water or soils and not both for this gene. I have not done any filtering to remove rare OTUs either by requiring the presence in > n samples or > m total observations.
I will update you with any additional troubleshooting findings as they come in.
The text was updated successfully, but these errors were encountered: