Generate nine different plots (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) from RNAseq result table using seaborn program.
This software runs in python 2.7 environment. Please type this code "conda install -c anaconda seaborn=0.9.0" to update seaborn to use rnaseq_figure_plotter software.
It is python codes and use "python rnaseq_figure_plotter.py -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6" to run!
HELP -h, --help show this help message and exit
required function
INPUT -i, --input input file name
TYPE -t, --type choose plot types (bar, box, density, dot, heatmap, histogram, line, scatter, or violin)
general optional function
OUTPUT -o, --output default output; output file name
GENE -g, --gene file name of specific gene ID list; generate "output"_gene_selection.txt file
LOG2 -l, --log default None; calculate log value (log2; 2, log10; 10, loge; e)
LOG2_NUMBER -lgn, --log_number default 0.000000001; add number to avoid -inf for log value
XAXIS -x, --xaxis default samples; choose x-axis (gene, sample, or value)
YAXIS -y, --yaxis default data; choose y-axis (gene, sample, or value)
ZAXIS -z, --zaxis default gene; choose z-axis (gene, sample, or value)
COLOR -c, --color default 1; choose color type (1-10)
FIGURE_SAVE_FORMAT -f, --figure_save_format default pdf; choose format of figures (eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff)
optional parameter for individual plot types
STYLE -s, --style default 1; choose style of figures (1-8)
ZSCORE -zs, --zscore default None; apply z-score transformation in heatmap. Z-score application in column or row is --xaxis (column); 1, and --zaxis (row); 2)
CLUSTER_COLUMN -cc, --cluster_column default None; apply column cluster function for heatmap (on; 1)
CLUSTER_ROW -cr, --cluster_row default None; apply row cluster function for heatmap (on; 1)
SCATTER_COLUMN -sc, --scatter_column default None; type column of two samples for comparison in dot plot. Split samples by comma(,). (example "sample1,sample2")
SCATTER_ROW -sr, --scatter_row default None; type row of two genes for comparison in dot plot. Split genes by comma(,). (example "geneA,geneB")
Input file requires to be tab delimited file. First column and row should be gene ID and sample name, respectively. Gene expression value starts from second columns and rows.
Example of input file looks like followings;
sample1 sample2 sample3 sample4 sample5
geneA 1 3 5.5 7 2
geneB 100 267 55 79 62
geneC 0.3 0.65 9.5 0.87 2.1
geneD 205 356 78 67 2900
geneE 1001 3001 5500 7001 2001
geneF 2 2 2 2 2
geneG 0.01 0.03 0.5 0.07 0.02
There are nine types of plot you can choose from bar, box, density, dot, heatmap, histogram, line, scatter, or violin.
All plots are generated by using Seaborn (https://seaborn.pydata.org).
Provide output file name.
Gene ID should be in first row and split by \n.
Example of specific gene ID list file looks like followings;
geneA
geneD
geneG
(-g, --gene) function automatically selects expression value consistent with provided specific gene ID, and provides "output"_gene_selection.txt file.
Example of "output"_gene_selection.txt file looks like followings;
geneA 1 3 5.5 7 2
geneD 205 356 78 67 2900
geneG 0.01 0.03 0.5 0.07 0.02
Provide log2, log10, or loge transform for gene expression value by type 2, 10, or e, respectively in (-l, --log) function. Default of (-l, --log) function is off (None).
To avoid -inf for log2 value for generating plots, (-lgn, --log2_number) function add tiny values (defalut 0.000000001). You can customize this value by type number (example 0, 0.000001, 0.000000000000000001, etc...).
Default of x-axis, y-axis, and z-axis are sample, data, and gene, respectively. Sample, data, and gene refer to sample name, gene expression value, and gene ID, respectively.
Following table shows which axis you can modify.
plots x-axis y-axis legend
bar x y z*
box x y
density x*
dot x y z*
heatmap x* z*
histogram x*
line x* y(data) z*
scatter
violin x y
*(sample or gene)
Seaborn color palette (https://seaborn.pydata.org/tutorial/color_palettes.html) is using for color setting. Setting is followings;
settings palette color description
1 RdBu_r (default) red to blue
2 Reds red to white
3 Blues blue to white
4 RdYlBu_r red to yellow to blue
5 RdGy_r red to glay
6 Paired read seaborn website
7 cubehelix read seaborn website
8 muted read seaborn website
9 hls read seaborn website
10 Set2 read seaborn website
Provided save figure format. Default is pdf, you can also choose eps, jpeg, jpg, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff
Seaborn set_style and set_context (https://seaborn.pydata.org/tutorial/aesthetics.html) is using for style setting. Setting is followings;
set_style and set_context are background settings and size (paper; small and talk; large), respectively.
settings set_style set_context
1 whitegrid paper
2 whitegrid talk
3 white paper
4 white talk
5 darkgrid paper
6 darkgrid talk
7 dark paper
8 dark talk
(-zs, --zscore) function can be used for heatmap. Z-score application for column (-x, --xaxis) and row (-z, --zaxis) are 1 and 2, respectively.
Apply clustering in column and/or row by type 1.
Type two dataset settings for column (sample) and row (gene) by (-sc, --scatter_column) and (-sr, --scatter_row) function, respectively. This code is required for scattered plot.
(-sc, --scatter_column) and (-sr, --scatter_row) function required dataset "x-axis,y-axis" for scattered plot and split samples or genes by comma(,). Example of (-sc, --scatter_column) and (-sr, --scatter_row) are "sample1,sample3" and "geneA,geneG", respectively. Color cannot change in scatter plot function.