-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added R gff gene length histogram example
- Loading branch information
Showing
1 changed file
with
39 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -460,6 +460,42 @@ cut -f 3 sample.gff | grep 'tRNA' | wc -l | |
|
||
Some more useful one-line unix commands for GFF files: [here](https://github.com/stephenturner/oneliners#gff3-annotations) | ||
|
||
Now we're going to play around with the GFF in R. Specifically, we're interested in looking at the distribution of gene length for all of the genes in the gff file. | ||
|
||
Copy the sample.gff file to your computer using scp or cyberduck: | ||
|
||
``` | ||
scp [email protected]:/nfs/esnitkin/micro612w19_fluxod/shared/data/day1_morn/sample.gff ~/Desktop/ | ||
Note: You can use your choice of folder/path to copy the file instead of “~/Desktop/” | ||
``` | ||
|
||
Open a text file in RStudio and run the following commands: | ||
``` | ||
# Plot histogram of gene lengths | ||
# Read in gff file | ||
gff = read.delim('~/Desktop/sample.gff', | ||
comment.char = '#', # ignore lines that start with '#' | ||
header=F) # no header | ||
# Rename columns | ||
colnames(gff) = c('seqname','source','feature','start','end','score','strand','frame','attribute') | ||
# Look at the head of the gff file | ||
head(gff) | ||
# Get the gene lengths | ||
gene_lengths = gff$end - gff$start | ||
# Plot a histogram of the gene lengths | ||
hist(gene_lengths, | ||
breaks = 100, # 100 cells | ||
xlab = 'Gene Length (bp)', # change x label | ||
main = '') # no title | ||
``` | ||
|
||
What information do you learn about gene lengths in this genome? | ||
|
||
|
||
Plotting genomic coverage in R | ||
------------------------------ | ||
|
@@ -472,7 +508,9 @@ One such parameter is validating how well your sequencing experiment performed a | |
The input for this task is a comma-separated file, which contains average sequencing coverage information i.e average number of reads mapped to each 1000 base pairs in reference genome. You can find this input file in your day1_morn directory by the name, Ecoli_coverage_average_bed.csv | ||
|
||
<!--- | ||
Let’s copy Ecoli_coverage_average_bed.csv file from flux shared directory to your desktop using ‘scp’. ‘scp’ stands for secure copy and is used for securely transferring files between remote host/server(flux) and your local computer system. (Both directions) | ||
Let’s copy Ecoli_coverage_average_bed.csv file from flux shared directory to your desktop using ‘ | ||
’. ‘ | ||
’ stands for secure copy and is used for securely transferring files between remote host/server(flux) and your local computer system. (Both directions) | ||
scp [email protected]:/scratch/micro612w18_fluxod/shared/Ecoli_coverage_average_bed.csv ~/Desktop/ | ||
Note: You can use your choice of folder/path to copy the file instead of “~/Desktop/” | ||
--> | ||
|