-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLOT_EXPLORATORY module breaks with large number of samples. #214
Comments
As part of #221 I reduced the granularity of density calculation and made a couple of other tweaks that might make a small difference. Fundamentally though, this pipeline is not optimised for large sample numbers, and we need to think what that might look like. We probably need to make some plots/ report components conditional on smaller sample numbers, or use different summary statistics across sample groups. This won't be fixed imminently, but we'll keep it on the TODO list. |
Description of the bug
When running a large number of samples (~700 in the samplesheet) through the differentialabundance pipeline, the
PLOT_EXPLORATORY
module will generate an OOM error, even when scaling up RAM usage to 128gb via AWS Batch. The R App generates fine, indicating it isn't necessarily an issue with the inputs.Jonathan Manning has suggested it may lie in the density function of the boxplot.R specifically.
Command used and terminal output
The following error occurred:
ERROR ~ Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY (Treatment)'
Caused by:
Essential container in task exited - OutOfMemoryError: Container killed due to memory usage
Command executed:
exploratory_plots.R
--sample_metadata "IPSC_21_all_cpd_samples_samplesheet_mod_for_diff_abundance.sample_metadata.tsv"
--feature_metadata "matrix_as_anno.feature_metadata.tsv"
--assay_files "salmon.merged.gene_counts_length_scaled.assay.tsv,all.normalised_counts.tsv,all.vst.tsv"
--contrast_variable "Treatment"
--outdir "Treatment"
--sample_id_col "sample" --feature_id_col "gene_id" --assay_names "raw,normalised,variance_stabilised" --final_assay "variance_stabilised" --outlier_mad_threshold -5 --palette_name "Set1"
cat <<-END_VERSIONS > versions.yml$(echo $ (R --version 2>&1) | sed 's/^.R version //; s/ .$//')
"NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:PLOT_EXPLORATORY":
r-base:
r-shinyngs: $(Rscript -e "library(shinyngs); cat(as.character(packageVersion('shinyngs')))")
END_VERSIONS
Command exit status:
137
Command output:
[1] "Reading inputs..."
[1] "Creating output paths..."
[1] "Writing boxplots..."
[1] "... static"
null device
1
[1] "Writing density plots..."
[1] "... static"
Command error:
(more omitted..)
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':
findMatches
The following objects are masked from 'package:base':
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':
rowMedians
The following objects are masked from 'package:matrixStats':
anyMissing, rowMedians
Attaching package: 'shinyngs'
The following object is masked from 'package:MatrixGenerics':
colMedians
The following object is masked from 'package:matrixStats':
colMedians
[1] "Reading inputs..."
[1] "Creating output paths..."
[1] "Writing boxplots..."
[1] "... static"
null device
1
[1] "Writing density plots..."
[1] "... static"
.command.sh: line 8: 223 Killed
The text was updated successfully, but these errors were encountered: