The batch job was used to map the RNA sequencing data to the available exon annotations from GENCODE using bedops, samtools and bedtools.
- input:
- output folder
- first bam file (reads)
- second bam file (reads)
- output:
- list of exons with mapped reads
Was used to calculate the maximum a posterior probability for the inclusion of the investigated exons.
- input:
- data from PreProcessing_RNAseqData.sbatch
- cell type (IMR90, Gm12878 or H1hesc)
- output:
- A table located in the directory at the end of the script containing the maximum a posterior probabilities for the inclusion of the exons.
The script is used to declare the given CpGs as mCpGs, when the methylation rate: x => t1 && x <= t2. The script can also be modified to use a smoothed distribution with the bsseq package.
- input:
- bed files (WGBS of the cell)
- cell type (IMR90, Gm12878 or H1hesc)
- lower threshold for a CpG to be defined as a mCpG (t1)
- upper threshold for a CpG to be defined as a mCpG (t2)
- output:
- A list of identified mCpGs for each cell type.
The batch job was used to apply Cufflinks to the transcript data of the available reads.
- input:
- output folder
- bam file
- cell type (IMR90, Gm12878 or H1hesc)
- output:
- Cufflinks standard output
The script was used to look at some characteristics of the exons and introns (like the length comparison or the Cpg/mCpG ratio comparison) and to filter our some of the exons.
- input:
- data from ProabilityInclusion.R
- data from PreProcessingWGBS.R.R
- data of runCufflinks.sbatch
- cell type (IMR90, Gm12878 or H1hesc)
- output:
- various plots
The Script was used to define the feature matrix for the ANN and GBM. The cell type has to be changed in the beginning of the script.
- input:
- data from AnalysisExonsIntrons.R
- output:
- list of features for the cell type
The scripts writes the input matrix for the deep learning algorithm. Cell type can be changed in the beginning of the script.
- input:
- data from CreateFeatures.R
- output:
- Matrix for the training of an ANN.
The script was used to plot the methylation profiles before the data was fed to the ANN.
- input:
- data from CreateFeatures.R
- output
- various plots
The was used to run a Metropolis Hastings algorithm to fine tune the parameter set of a GBM. The cell type can be changed in the beginning of the script. There is also a vector (in the beginning) which selects the features for the optimisation.
- input:
- data from CreateFeatures.R
- TRUE or FALSE
- TRUE = The script uses a default parameter set for the fine tuning found by grid search.
- FALSE = The script uses a random parameter set and tries to fine tune the set.
- output:
- A log file that contains the last line of MH algorithm (best model).
- (optional) If the script is run as a batch job, then the log file should contain all steps of the MH.
T script runs a Metroplis Hastings algorithm to fine tune the parameter setting for an ANN. The architecture stays constant but all other parameters changes. The cell type can be changed in the beginning of the script. The features selected for the optimisation can be changed in the section "Load Data".
- input:
- data from WriteANNMatrix.R
- units (number of units in the hidden layer)
- batch size (start value of the batch size)
- rate (start value of the learning rate)
- dropout (start value of the dropout)
- regularisation (start value of the l2 regularisation)
- momentum (start value of the momentum parameter)
- output:
- A log file that contains the last line of MH algorithm (best model).
- (optional) If the script is run as a batch job, then the log file should contain all steps of the MH.
The script used to analyse the prediction of the best ANN.
- input:
- data from WriteANNMatrix.R
- output:
- various plots
GBM.R: The script used to analyse the prediction of the best GBM.
- input:
- data from WriteANNMatrix.R
- output:
- various plots