-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathggtheme.Rmd
135 lines (107 loc) · 4.86 KB
/
ggtheme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
## More about the theme() function
The **theme()** allows a precise control of graphical parameters such as axis text, ticks and labels, or legend texts and labels, etc.
<br>
More details [here](https://ggplot2.tidyverse.org/reference/theme.html)
```{r, eval=T}
# Let's the project_long object from exercise 12, and plot boxplots
boxp <- ggplot(data=project_long, aes(x=variable, y=value, color=expr_limits)) +
geom_boxplot()
# Remove the legend title:
boxp + theme(legend.title=element_blank())
# Change font of legend text
boxp + theme(legend.title=element_blank(),
legend.text = element_text(colour="red", size = 8, face = "bold"))
# Put legend on the top of the plot
boxp + theme(legend.title=element_blank(),
legend.text = element_text(colour="red", size = 8, face = "bold"),
legend.position="top")
# Rotate x-axis labels
boxp + theme(legend.title=element_blank(),
legend.text = element_text(colour="red", size = 8, face = "bold"),
legend.position="top",
axis.text.x = element_text(angle = 90))
# Add a color to the plot's background
boxp + theme(legend.title=element_blank(),
legend.text = element_text(colour="red", size = 8, face = "bold"),
legend.position="top",
axis.text.x = element_text(angle = 90),
plot.background = element_rect(fill = "yellow"))
```
## Volcano plots
A volcano plot is a type of scatter plot represents differential expression of features (genes for example): on the x-axis we typically find the fold change and on the y-axis the p-value.
<br>
```{r}
# Download the data we will use for plotting
download.file("https://raw.githubusercontent.com/sarahbonnin/CRG_RIntroduction/master/de_df_for_volcano.rds", "de_df_for_volcano.rds", method="curl")
# The RDS format is used to save a single R object to a file, and to restore it.
# Extract that object in the current session:
tmp <- readRDS("de_df_for_volcano.rds")
# remove rows that contain NA values
de <- tmp[complete.cases(tmp), ]
```
```{r}
# The basic scatter plot: x is "log2FoldChange", y is "pvalue"
ggplot(data=de, aes(x=log2FoldChange, y=pvalue)) + geom_point()
```
Doesn't look quite like a Volcano plot...<br>
Convert the p-value into a -log10(p-value)
```{r}
# Convert directly in the aes()
p <- ggplot(data=de, aes(x=log2FoldChange, y=-log10(pvalue))) + geom_point()
```
```{r}
# Add more simple "theme"
p <- ggplot(data=de, aes(x=log2FoldChange, y=-log10(pvalue))) + geom_point() + theme_minimal()
```
```{r}
# Add vertical lines for log2FoldChange thresholds, and one horizontal line for the p-value threshold
p2 <- p + geom_vline(xintercept=c(-0.6, 0.6), col="red") +
geom_hline(yintercept=-log10(0.05), col="red")
```
```{r}
# The significantly differentially expressed genes are the ones found in the upper-left and upper-right corners.
# Add a column to the data frame to specify if they are UP- or DOWN- regulated (log2FoldChange respectively positive or negative)
# add a column of NAs
de$diffexpressed <- "NO"
# if log2Foldchange > 0.6 and pvalue < 0.05, set as "UP"
de$diffexpressed[de$log2FoldChange > 0.6 & de$pvalue < 0.05] <- "UP"
# if log2Foldchange < -0.6 and pvalue < 0.05, set as "DOWN"
de$diffexpressed[de$log2FoldChange < -0.6 & de$pvalue < 0.05] <- "DOWN"
# Re-plot but this time color the points with "diffexpressed"
p <- ggplot(data=de, aes(x=log2FoldChange, y=-log10(pvalue), col=diffexpressed)) + geom_point() + theme_minimal()
# Add lines as before...
p2 <- p + geom_vline(xintercept=c(-0.6, 0.6), col="red") +
geom_hline(yintercept=-log10(0.05), col="red")
```
```{r}
## Change point color
# 1. by default, it is assigned to the categories in an alphabetical order):
p3 <- p2 + scale_color_manual(values=c("blue", "black", "red"))
# 2. to automate a bit: ceate a named vector: the values are the colors to be used, the names are the categories they will be assigned to:
mycolors <- c("blue", "red", "black")
names(mycolors) <- c("DOWN", "UP", "NO")
p3 <- p2 + scale_colour_manual(values = mycolors)
```
```{r}
# Now write down the name of genes beside the points...
# Create a new column "delabel" to de, that will contain the name of genes differentially expressed (NA in case they are not)
de$delabel <- NA
de$delabel[de$diffexpressed != "NO"] <- de$gene_symbol[de$diffexpressed != "NO"]
ggplot(data=de, aes(x=log2FoldChange, y=-log10(pvalue), col=diffexpressed, label=delabel)) +
geom_point() +
theme_minimal() +
geom_text()
```
```{r}
# Finally, we can organize the labels nicely using the "ggrepel" package and the geom_text_repel() function
# load library
library(ggrepel)
# plot adding up all layers we have seen so far
ggplot(data=de, aes(x=log2FoldChange, y=-log10(pvalue), col=diffexpressed, label=delabel)) +
geom_point() +
theme_minimal() +
geom_text_repel() +
scale_color_manual(values=c("blue", "black", "red")) +
geom_vline(xintercept=c(-0.6, 0.6), col="red") +
geom_hline(yintercept=-log10(0.05), col="red")
```