Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R "killed" or cores not returning data #136

Open
AlexaBennett opened this issue Nov 4, 2022 · 4 comments
Open

R "killed" or cores not returning data #136

AlexaBennett opened this issue Nov 4, 2022 · 4 comments

Comments

@AlexaBennett
Copy link

AlexaBennett commented Nov 4, 2022

I am attempting to use DivNet and can't get divnet() to work when my tables are clustered to a higher percentage similarity. I have attempted clustering to various percentage similarities; 80% similarity results in a matrix with dim(27, 2082).

divnet_output` <- ordered_80p_samples %>%
+   divnet(tuning = "careful", base = "OTU140", ncores = nproc)
> Removing absent taxa!
|                                                                      |   0%Killed

I have attempted using nproc = 1, 4, 8, 16, and 32 cores. At one and 4 cores, I tend to get the above result with "killed". At 8 through 32 cores there is usually an error message about x, y, z cores not returning data, and it will affect the final calculations. The exact values for x, y, and z will change over time, and usually represent 25%-50% of the allotted cores. I am currently running this on our computing cluster but will be testing the same objects on our workstation PC while I await your reply. The latest specs for the cluster node: 32 3.7 GHz cores, 512GB of memory (8 x 64 GB RDIMM), 10gigabit network card, 480 GB SSD.

If I cluster 20% similarity, dim(27, 63), then everything runs as expected.

To add additional context, I have to physically pick the base OTU since there is no common base. My target is a functional gene, and the samples contain different types of environments. Therefore, one of the best base OTU is present in only 56% of the samples and a mean of 74 observations per sample !=0. I had planned to run the calculations on several bases to see how it influences the results. because visual inspection of my data shows OTUs tend to be present in water or soils and not both for this gene. I have not done any filtering to remove rare OTUs either by requiring the presence in > n samples or > m total observations.

I will update you with any additional troubleshooting findings as they come in.

@AlexaBennett
Copy link
Author

If I adjust the divnet() parameters to the following, I can get divnet() to work for datasets up to dim(27, 2082). It is worth noting that increasing ncores will cause a failure.

tuning = "fast", ncores =1, network = "diagonal"

I tried drastically decreasing iterations to see what happens, and the datasets with dim(27, >5,000) continue to fail.

divnet_output <- ordered_97p %>% divnet(tuning = list(EMiter = 6, EMburn = 3, MCiter = 10, MCburn = 5), ncores = 1, network = "diagonal")

I will implement divnet-rs and report back.

@mooreryan
Copy link
Contributor

Were you ever able to make any progress with this?

@AlexaBennett
Copy link
Author

@mooreryan I currently have it DivNet working in R and have done benchmarking. Something about my data causes relatively small sets to fail due to memory limits. We have decided not to use the rust implementation for now in order to keep the pipeline friendly for bioinformaticians/comp. sci. users.

@mooreryan
Copy link
Contributor

Could you posts the benchmarks, and the sizes of the data you're using that causes failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants