-
Notifications
You must be signed in to change notification settings - Fork 2
BioTEA analyze options
Luca Visentin edited this page May 31, 2022
·
3 revisions
The configuration file created by biotea initialize
follows the yaml
format.
The options are divided into sections. Each section controls different aspects of the analysis. The possible types of the various options are annotated in parentheses. The available options are:
-
general
: The general section contains general options to control plot size and type, whether to include annotations, and if to print extra data snippets during the analysis:-
show_data_snippets
(bool
): Iftrue
, prints extra snippets of data during the analysis. This is useful to see if the analysis is doing anything wrong. The default istrue
. -
annotation_database
(bool
orstr
): This value controls how the data is annotated. Annotating data allows for plots with gene names instead of probe IDs, and a more useful output. Iftrue
, the data is annotated with an internal annotation file covering most Agilent and Affymetrix Human chips. Iffalse
, the data is not annotated. Otherwise, a string can be passed, representing the complete name of any annotation package on Bioconductor, which will be downloaded and installed on the fly, and used to source annotations. Note: the package must contain annotations forSYMBOL
s,ENSEMBL
IDs, andGENENAME
s, or the annotation will fail. This might be addressed in a future version of bioTEA. -
plots
: Various options to control plots:-
save_png
(bool
): Iftrue
, saves plots in.png
format. Otherwise, saves in.pdf
format. -
plot_width
(int
): The width of the plots, in inches. -
plot_height
(int
): The height of the plots, in inches. -
png_resolution
(int
): The pixels per inch of the png plots. -
enumerate_plots
(bool
): Iftrue
, each plot is marked with a number, in the order it is created.
-
-
-
switches
: The switches section contains various parameters to turn parts of the analysis on or off:-
dryrun
(bool
): Run the analysis, but do not save any output plots (with the exception of the log file). This can be useful to test out the analysis before committing to it, especially when used in combination with theslowmode
andshow_data_snippets
options. -
renormalize
(bool
): Run quantile-quantile normalization on the data. This can be useful to normalize "unruly" samples, for which the normalization steps inprepaffy
andprepagil
are not enough. Additional plots are saved to appreciate this extra normalization step. -
limma
(bool
): Run DEA withlimma
. -
rankproduct
(bool
): Run DEA withRankProduct
. -
convert_counts
: If the input data is count data (e.g. RNAseq data), set this totrue
to usevoom
to transform the data to continuous values before the analysis. Count values by themselves are unsupported. Defaults tofalse
.
-
-
design
: The design section contains crucial options to set the experimental design that will be used to steer the analysis.-
experimental_design
(str
): The experimental design of the experiment. It must be a comma-delimited set of values, of the same length as the number of input samples, with each value being the label for the experimental variable of interest. Numbers specified at the end of each group can be used to represent sample pairings. For a detailed guide on how to define this parameter, see the "Design Strings" wiki page. -
contrasts
(list of str
): A list of strings of the type"group1-group2"
, where each "group" is a level in the experimental design. Each value in the list specifies a (different) contrast of interest. The second group in each contrast is considered the "background" or "control" status (i.e. "group2"). For a detailed guide on how to define this parameter, see the "Design Strings" wiki page. -
batches
(null
ORstr
): Ifnull
, no batch effect will be corrected, assuming all samples derive from the same batch. Ifstr
, it is treated similar to theexperimental_design
string, and each level refers to a different batch. Note thatRankProd
cannot correct batch effects if, inside each batch, there are not at least than two samples for each experimental condition. -
extra_limma_vars
(null
OR nestedstr
: Ifnull
, nothing happens. If nestedstr
, each string in the nested list is treated similar to theexperimental_design
string, adding an additional variable to thelimma
analysis. This can be useful to control for additional confounding variables in the experiment. For an example, see the default configuration file. -
group_colors
(list of str
): A list of strings that can be understood byR
to be a colour. Each colour will be paired with a different condition type in the experimental design, so at least that many colours must be specified. A list of possible colours can be found here. -
filters
: The parameters used by the analysis to filter the data:-
log2_expression
(float
): Filter out any genes that have lower Log2 expression than this value. This value depends a lot on the experiment, and is generally higher for Agilent arrays. The default of "4" is good for Affymetrix arrays. -
fold_change
(float
): Filter out (mark as non-differentially-expressed) any genes that have lower absolute Fold Change than this value. This is used to remove from the analysis all genes that even if detected to be a DEG have too little expression, as additional validation with other methods (such asPCR
) would be impossible. -
min_groupwise_presence
(float
): A value between 0 and 1, representing the proportion of samples in a single group of interest in which thefold_change
filter threshold must be violated to be filtered out. If a gene passes the filter in at least one group, it is retained in the analysis. This allows for a more conservative filtering of the genes.
-
-