Skip to content

dongzhuoer/thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graduation Project at Nankai University

There is a brief introduction about the scientific significance, but I think the most valuable part is the R packages:

  1. rGEO is the core package, it converts GEO microarray data to a standard format, i.e, map probes to HUGO gene symbol.
  2. qGSEA is developed based on rGEO, it can quickly prepare GSEA input files from GEO. The biggest feature is a GUI in browser, user just need to specify an accession in the simplest case.
  3. hgnc is a dependency of rGEO. In short, rGEO find where information lies and hgnc convert that information to standard format. hgnc may be also useful in other situations so I make it a separated package.
  4. rGEO.data is mainly for prevening rGEO from getting too large, it contains several big data used in rGEO testing and functions to create them.

First reproduciable research project

Finally (2022-05-04) I give up maintaining the R packages, instead I freeze them in a Docker image (and watch them decaying)

docker pull dongzhuoer/thesis

话虽如此,论文和展示中所有图片都提供了数据和R代码。

可惜论文无法build,(都是米哈游bookdown干的),等修复后可以补上reproduce的指导。

setwd("bookdown")
install.packages(c('magrittr', 'tidyverse', 'cowplot', 'bookdown'))
bookdown::render_book('thesis.Rmd')

Other resource

under the following conventions,

  • lem4 is alias for ssh workstation
  • output is rendered on 2019-04-03, and masked by #>
  • all code containing output can run, except for gsea_output_full.rds (too big)

and online resource

For developers

# four R packages
docker build -t dongzhuoer/thesis r-lib
docker run --rm dongzhuoer/thesis Rscript -e "library('rGEO.data');library('rGEO');library('hgnc');library('rGEO.data')"

# figures
docker run -u `id -u`:`id -g` -v `pwd`:/root -w /root --rm dongzhuoer/thesis:figure Rscript -e "rmarkdown::render('figure.Rmd')"
rm -r figure/ppt
## build Docker image
wget -P figure http://ftp.ubuntu.com/ubuntu/ubuntu/pool/universe/f/fonts-wqy-zenhei/fonts-wqy-zenhei_0.9.45-7ubuntu1_all.deb
docker build -t dongzhuoer/thesis:figure figure 

# update website
git clone -b gh-pages [email protected]:dongzhuoer/thesis.git html
Rscript -e "rmarkdown::render('index.Rmd')"
cp -f *.html html
docker rm -f testsite     
docker run -p 127.0.0.1:1024:80 -v ~/research/thesis/html:/usr/local/apache2/htdocs:ro -dt --name testsite httpd:alpine
cd html && git add *.html && git commit -m "update *.html" && git push && cd ..
rm -rf html 

科研经历

Understand the principle of GSEA. Then its input file format, parameter meaning, and how to interpret its output.

Explore the architecture of GEO, the kinds of data it provides, and their file format. The most onerous part is to extract meaningful things, which leads to development of rGEO

The next step is quite nature, convert that meaningful thing to GSEA input and run GSEA (thousands of, thanks to Prof. Xie's workstation). In the process I develop qGSEA, I think its significance fall far behind rGEO, though many user might love the former.

Finally, the most tough part, synthesis the result and draw conclusion. There is no precedent to follow, I have to make every decision by myself. Part of the process is presented in the thesis, but that is just the tip of the iceburg.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License