R "killed" or cores not returning data #136

AlexaBennett · 2022-11-04T14:58:46Z

I am attempting to use DivNet and can't get divnet() to work when my tables are clustered to a higher percentage similarity. I have attempted clustering to various percentage similarities; 80% similarity results in a matrix with dim(27, 2082).

divnet_output` <- ordered_80p_samples %>%
+   divnet(tuning = "careful", base = "OTU140", ncores = nproc)
> Removing absent taxa!
|                                                                      |   0%Killed

I have attempted using nproc = 1, 4, 8, 16, and 32 cores. At one and 4 cores, I tend to get the above result with "killed". At 8 through 32 cores there is usually an error message about x, y, z cores not returning data, and it will affect the final calculations. The exact values for x, y, and z will change over time, and usually represent 25%-50% of the allotted cores. I am currently running this on our computing cluster but will be testing the same objects on our workstation PC while I await your reply. The latest specs for the cluster node: 32 3.7 GHz cores, 512GB of memory (8 x 64 GB RDIMM), 10gigabit network card, 480 GB SSD.

If I cluster 20% similarity, dim(27, 63), then everything runs as expected.

To add additional context, I have to physically pick the base OTU since there is no common base. My target is a functional gene, and the samples contain different types of environments. Therefore, one of the best base OTU is present in only 56% of the samples and a mean of 74 observations per sample !=0. I had planned to run the calculations on several bases to see how it influences the results. because visual inspection of my data shows OTUs tend to be present in water or soils and not both for this gene. I have not done any filtering to remove rare OTUs either by requiring the presence in > n samples or > m total observations.

I will update you with any additional troubleshooting findings as they come in.

The text was updated successfully, but these errors were encountered:

AlexaBennett · 2022-11-17T20:39:00Z

If I adjust the divnet() parameters to the following, I can get divnet() to work for datasets up to dim(27, 2082). It is worth noting that increasing ncores will cause a failure.

tuning = "fast", ncores =1, network = "diagonal"

I tried drastically decreasing iterations to see what happens, and the datasets with dim(27, >5,000) continue to fail.

divnet_output <- ordered_97p %>% divnet(tuning = list(EMiter = 6, EMburn = 3, MCiter = 10, MCburn = 5), ncores = 1, network = "diagonal")

I will implement divnet-rs and report back.

mooreryan · 2023-02-02T06:29:48Z

Were you ever able to make any progress with this?

AlexaBennett · 2023-03-14T00:38:49Z

@mooreryan I currently have it DivNet working in R and have done benchmarking. Something about my data causes relatively small sets to fail due to memory limits. We have decided not to use the rust implementation for now in order to keep the pipeline friendly for bioinformaticians/comp. sci. users.

mooreryan · 2023-03-14T16:30:38Z

Could you posts the benchmarks, and the sizes of the data you're using that causes failure?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R "killed" or cores not returning data #136

R "killed" or cores not returning data #136

AlexaBennett commented Nov 4, 2022 •

edited

Loading

AlexaBennett commented Nov 17, 2022

mooreryan commented Feb 2, 2023

AlexaBennett commented Mar 14, 2023

mooreryan commented Mar 14, 2023

R "killed" or cores not returning data #136

R "killed" or cores not returning data #136

Comments

AlexaBennett commented Nov 4, 2022 • edited Loading

AlexaBennett commented Nov 17, 2022

mooreryan commented Feb 2, 2023

AlexaBennett commented Mar 14, 2023

mooreryan commented Mar 14, 2023

AlexaBennett commented Nov 4, 2022 •

edited

Loading