MutationalPatterns

The MutationalPatterns R package provides a comprehensive set of flexible functions for easy finding and plotting of mutational patterns in Single Nucleotide Variant (SNV) data.

Getting started

Installation

This package is dependent on R version 3.2.4

Install and load devtools package

install.packages("devtools")
library(devtools)

Install and load MutationalPatterns package

options(unzip = 'internal')
install_github("CuppenResearch/MutationalPatterns")
library(MutationalPatterns)

Reference genome

List all available reference genomes (BSgenome)

available.genomes()

Download and load your reference genome of interest

ref_genome = "BSgenome.Hsapiens.UCSC.hg19"
source("http://bioconductor.org/biocLite.R")
biocLite(ref_genome)
library(ref_genome, character.only = T)

Load SNV data

Find package example data

vcf_files = list.files(system.file("extdata", package="MutationalPatterns"), full.names = T)

Load a single vcf file

vcf = read_vcf(vcf_files[1], "sample1")

Load a list of vcf files

sample_names = c("colon1", "colon2", "colon3", "intestine1", "intestine2", "intestine3", "liver1", "liver2", "liver3")
vcfs = read_vcf(vcf_files, sample_names)

Include relevant metadata in your analysis, e.g. donor id, cell type, age, tissue type, mutant or wild type

tissue = c("colon", "colon", "colon", "intestine", "intestine", "intestine", "liver", "liver", "liver")

Analyses

Mutation types

Retrieve base substitutions from vcf object as "REF>ALT"

get_muts(vcfs[[1]])

Retrieve base substitutions from vcf and convert to the 6 types of base substitution types that are distinguished by convention: C>A, C>G, C>T, T>A, T>C, T>G. For example, if the reference allele is G and the alternative allele is T (G>T), this functions returns the G:C>T:A mutation as a C>A mutation.

get_types(vcfs[[1]])

Retrieve the context (1 base upstream and 1 base downstream) of the positions in the vcf object from the reference genome.

get_mut_context(vcfs[[1]], ref_genome)

Retrieve the types and context of the base substitution types for all positions in the vcf object. For the base substitutions that are converted to the conventional base substitution types, the reverse complement of the context is returned.

get_type_context(vcfs[[1]], ref_genome)

Count mutation type occurences for one vcf object

type_occurences = mut_type_occurences(vcfs[1], ref_genome)

Count mutation type occurences for all samples in a list of vcf objects

type_occurences = mut_type_occurences(vcfs, ref_genome)

Mutation spectrum

Plot mutation spectrum over all samples. Plottes is the mean relative contribution of each of the 6 base substitution types. Error bars indicate standard deviation over all samples. The n indicates the total number of mutations in the set.

plot_spectrum(type_occurences)

Plot mutation spectrum with distinction between C>T at CpG sites

plot_spectrum(type_occurences, CT = T)

Plot spectrum without legend

plot_spectrum(type_occurences, CT = T, legend = F)

Plot spectrum for each tissue separately

plot_spectrum(type_occurences, by = tissue, CT = T)

Specify 7 colors for spectrum plotting

my_colors = c("pink", "orange", "blue", "lightblue", "green", "red", "purple")
plot_spectrum(type_occurences, CT = T, legend = T, colors = my_colors)

96 Mutation Profile

Make 96 trinucleodide mutation count matrix

mut_matrix = make_mut_matrix(vcf_list = vcfs, ref_genome = ref_genome)

Plot 96 profile of three samples

plot_96_profile(mut_matrix[,c(1,4,7)])

Compare Profiles

Compare two profiles by bootstrapping analysis and get a p-value for their difference

comparison <- profile_bootstrap_comparison(mut_matrix[,1], #colon1
                                           mut_matrix[,4], #intestine1
                                           random.seed = 123)
comparison$overallPvalue

> 0.136

Identify what mutations cause significant differences between two profiles and get their p-values

comparison <- profile_bootstrap_comparison(mut_matrix[,1], #colon1
                                           mut_matrix[,7], #liver1
                                           random.seed = 123)
comparison$mutTypePvalues_corrected[comparison$mutTypePvalues_corrected < 0.05]

GCG C>G    ACG C>T    CCG C>T    GCG C>T    TCA C>T    TCG C>T    ATG T>C    CTG T>G    GTG T>G    TTC T>G 
0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.03919184 0.00000000 0.00000000

Extract Signatures

Estimate optimal rank for NMF mutation matrix decomposition

estimate_rank(mut_matrix, rank_range = 2:5, nrun = 50)

Extract and plot 3 signatures

nmf_res = extract_signatures(mut_matrix, rank = 3)
# provide signature names (optional)
colnames(nmf_res$signatures) = c("Signature A", "Signature B" , "Signature C")
# plot signatures
plot_96_profile(nmf_res$signatures)

Plot signature contribution

# provide signature names (optional)
rownames(nmf_res$contribution) = c("Signature A", "Signature B" , "Signature C")
# plot signature contribution
plot_contribution(nmf_res$contribution, coord_flip = T)

Compare reconstructed mutation profile with original mutation profile

plot_compare_profiles(mut_matrix[,1], nmf_res$reconstructed[,1], profile_names = c("Original", "Reconstructed"))

Fit 96 mutation profiles to known signatures

Download signatures from pan-cancer study Alexandrov et al.

url = "http://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt"
cancer_signatures = read.table(url, sep = "\t", header = T)
# reorder (to make the order of the trinucleotide changes the same)
cancer_signatures = cancer_signatures[order(cancer_signatures[,1]),]
# only signatures in matrix
cancer_signatures = as.matrix(cancer_signatures[,4:33])

Fit mutation matrix to cancer signatures. This function finds the optimal linear combination of mutation signatures that most closely reconstructs the mutation matrix by solving nonnegative least-squares constraints problem

fit_res = fit_to_signatures(mut_matrix, cancer_signatures)
# select signatures with some contribution
select = which(rowSums(fit_res$contribution) > 0)
# plot contribution
plot_contribution(fit_res$contribution[select,], coord_flip = T)

Compare reconstructed mutation profile of sample 1 using cancer signatures with original profile

plot_compare_profiles(mut_matrix[,1], fit_res$reconstructed[,1], profile_names = c("Original", "Reconstructed \n cancer signatures"))

Rainfall plot

A rainfall plot visualizes mutation types and intermutation distance. Rainfall plots can be used to visualize the distribution of mutations along the genome or a subset of chromosomes. The y-axis corresponds to the distance of a mutation with the previous mutation and is log10 transformed. Drop-downs from the plots indicate clusters or "hotspots" of mutations.

Make rainfall plot of sample 1 over all autosomal chromosomes

# define autosomal chromosomes
chromosomes = seqnames(get(ref_genome))[1:22]
# make rainfall plot
plot_rainfall(vcfs[[1]], title = names(vcfs[1]), ref_genome = ref_genome, chromosomes = chromosomes, cex = 1.5)

Make rainfall plot of sample 1 over chromosome 1

chromosomes = seqnames(get(ref_genome))[1]
plot_rainfall(vcfs[[1]], title = names(vcfs[1]), ref_genome = ref_genome, chromosomes = chromosomes[1], cex = 2)

Things to do

1 Genomic distribution

genomic regions bed input
surveyed area??

2 Strand bias analysis

get gene bodies for each ref genome

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
R		R
images		images
inst/extdata		inst/extdata
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MutationalPatterns.Rproj		MutationalPatterns.Rproj
NAMESPACE		NAMESPACE
README.md		README.md
manual_code.R		manual_code.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MutationalPatterns

Getting started

Installation

Reference genome

Load SNV data

Analyses

Mutation types

Mutation spectrum

96 Mutation Profile

Compare Profiles

Extract Signatures

Fit 96 mutation profiles to known signatures

Rainfall plot

Things to do

About

Releases

Packages

Languages

License

jgoldmann/MutationalPatterns

Folders and files

Latest commit

History

Repository files navigation

MutationalPatterns

Getting started

Installation

Reference genome

Load SNV data

Analyses

Mutation types

Mutation spectrum

96 Mutation Profile

Compare Profiles

Extract Signatures

Fit 96 mutation profiles to known signatures

Rainfall plot

Things to do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages