Guideline of Programming for Biologists

1 Programming languages

###---Main goal---###
Learn the basic usage of each language
- understand data types,variables,loop,
- learn to use terminal

Python Programming:
- Basics:
Variables, Data Types, and Operators
Control Flow (if statements, loops)
Functions and Modules
- Intermediate:
Lists, Dictionaries, and Sets
File Handling
Exception Handling
- Advanced:
Object-Oriented Programming (OOP)
Decorators and Generators
Virtual Environments

Programming languages

Python
R
shell

Recommendation books and websites

I would highly recommend to read "Computing Skills for Biologists A Toolbox"

Python Programming website (https://www.w3schools.com/python/python_variables.asp)
R Programming Website (https://daviddalpiaz.github.io/appliedstats/introduction-to-r.html)
Computing Skills for Biologists A Toolbox

2 Important envirnment management Softwares

###---Main goal---###
- learn to use programming supportive platform
- learn to manage environments and packages
- learn to version control

2-1 programming packages management Software

Anaconda(https://www.anaconda.com/products/individual)
learn to use conda (https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html)

comments: anaconda is useful tool for  python packages and enviroments management

2-2 Integrated development environment(IDE) Software

Pycharm (https://www.jetbrains.com/pycharm/)
learn to create first project (https://www.jetbrains.com/help/pycharm/creating-and-running-your-first-python-project.html)
learn to create conda env on pycharm (https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html#conda-requirements)

comments: pycharm is a powerful IDE for different programming languages (support Python, R, etc.). It is used for code writing, testing and debugging.

2-3 Version control

git and github (https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners)
add VS on Pycharm (https://www.jetbrains.com/help/pycharm/version-control-integration.html)

comments: verson control is extremely important since it will track all changes you made to the files

3 Common packages

###---Main goal---###
understand the main functions of each package

3-1 python packages

Biopython (http://biopython.org/DIST/docs/tutorial/Tutorial.html)
Pandas
prody
Igblast
PyIR
git
snakemake
numpy
scikit-learn
pytorch
tensorflow

comments: 
#1 Biopython is super helpful package for biological application, so I highly recommend go throughly its tutorial
#2 I also provide some cheatsheets on the text_book file for assisting programming

3-2 R packages

ggplot2 (http://www.sthda.com/english/wiki/ggplot2-essentials)
dplyr
ggpubr

4 Applications

###---Main goal---###
run and test some toy experiments

4-1 Bioinformatics

4-1-1 Deep Mutational Scanning analyses

H3N2 NA antigenic region DMS (https://github.com/Wangyiquan95/NA_EPI)

4-1-2 NGS analyses

H3N2 HA egg-passaging adaptation (https://github.com/Wangyiquan95/HA_egg_passage)
SARS-CoV-2 cell culture-adaptive mutations (https://github.com/nicwulab/SARS-CoV-2_in_vitro_adaptation)

4-2 Machine learning

###---Main goal---###
- Fundamentals:
Supervised vs. Unsupervised Learning
Types of Machine Learning Algorithms (self-supervised, generative,etc)
Training and Testing Data
- Common Models:
Transformer
GPT
BERT
GNN
Difussion
- Evaluation and Optimization:
Metrics (Accuracy, Precision, Recall)
Wanb (Visualization)
Hyperparameter Tuning

An explainable language model for antibody specificity prediction (https://github.com/Wangyiquan95/HA1)
Deep learning model for antigen identification (https://github.com/nicwulab/SARS-CoV-2_Abs)
H3N2 NA antigenic region DMS regression (https://github.com/Wangyiquan95/NA_EPI)

4-3 Web development

###---Main goal---###
- goals:
Build home website for running deep learning model on the server (AWS EC2)
1. Set Up an AWS EC2 Instance
2. Install Necessary Software (conda)
3. Develop the Web Application (Flask)
4. overview
flask --> gunicorn <--> nginx <--> requests

5 QUICK REFERENCES

ssh connection

If you've never run ssh before, you will need to create a .ssh directory. Run

mkdir -p ~/.ssh && chmod 700 ~/.ssh

Create config inside .ssh directory by

touch ~/.ssh/config
chmod 600 ~/.ssh/config

Add server info into config

Host wulab
    HostName nicwulab-linux.life.illinois.edu
    User id
    Port 22

To log in to the server, in the terminal, run

ssh wulab

move files

To copy from your computer to a (remote) server. Run(Change the path accordingly)

scp ~/local/path [email protected]:/home/server/path

or download files from server. Run

scp -r [email protected]:/home/server/path ~/local/path

install miniconda

Download miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install by

bash Miniconda3-latest-Linux-x86_64.sh

Create new env by

conda create --name Env_name python=3.9
conda activate Env_name

Remove env by

conda remove --name myenv --all

Create new env using yml by

conda env create -f environment.yml

Save conda env as yml by

conda activate my_env
conda env export > path/to/environment.yml

git

git init and connect to github repo by

git init 
git add README.md
git commit -m "README.md"
git branch -M main
git remote add origin <repository-url>

git remove files by

git rm --cached file.csv
git commit -m "Removed files"

git push -u <remote> <branch>

fetch and merge from github by

git fetch <remote>
git merge <remote>/<branch>

jupyter lab/notebook

Connect to server and initialize juypter lab by

ssh wulab
jupyter notebook --no-browser --port=8080
jupyter lab --no-browser --port=8080

Open another local terminal and connect it by

ssh -N -L 8080:localhost:8080 [email protected]

Copy the Jupyter lab URL that appears, and paste it into your web browser.

setup service for gunicorn

sudo nano /etc/systemd/system/helloworld.service

Then add this into the file.

[Unit]
Description=Gunicorn instance for a simple hello world app
After=network.target
[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/helloworld
ExecStart=/home/ubuntu/helloworld/venv/bin/gunicorn -b localhost:8000 app:app
Restart=always
[Install]
WantedBy=multi-user.target

Then enable the service:

sudo systemctl daemon-reload
sudo systemctl start helloworld
sudo systemctl enable helloworld

Nginx Webserver route request to gunicorn

Install Nginx

sudo apt-get nginx

Start the Nginx service and go to the Public IP address of your EC2 on the browser to see the default nginx landing page

sudo systemctl start nginx
sudo systemctl enable nginx

Edit the default file in the sites-available folder.

sudo nano /etc/nginx/sites-available/default

Add the following code at the top of the file (below the default comments)

upstream flaskhelloworld {
    server 127.0.0.1:8000;
}

Add a proxy_pass to flaskhelloworld atlocation /

location / {
    proxy_pass http://flaskhelloworld;
}

Restart Nginx

sudo systemctl restart nginx

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
text_book		text_book
README.md		README.md

Wangyiquan95/Biological_Programming_from_scratch

Folders and files

Latest commit

History

Repository files navigation

Guideline of Programming for Biologists

Contents

1 Programming languages

Programming languages

Recommendation books and websites

2 Important envirnment management Softwares

2-1 programming packages management Software

2-2 Integrated development environment(IDE) Software

2-3 Version control

3 Common packages

3-1 python packages

3-2 R packages

4 Applications

4-1 Bioinformatics

4-1-1 Deep Mutational Scanning analyses

4-1-2 NGS analyses

4-2 Machine learning

4-3 Web development

5 QUICK REFERENCES

ssh connection

move files

install miniconda

git

jupyter lab/notebook

setup service for gunicorn

Nginx Webserver route request to gunicorn

6 Tools

local igblast

Somatic Hypermutation(SHM) calling

CDRH3 clustering

Protein clustering

PDB renumbering

NGS data processing-1 (DMS)

NGS data processing-2 (Genome)

Language modeling

Transfer learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages