Skip to content

Wangyiquan95/Biological_Programming_from_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

Guideline of Programming for Biologists

Contents

1 Programming languages

###---Main goal---###
Learn the basic usage of each language
- understand data types,variables,loop,
- learn to use terminal

Python Programming:
- Basics:
Variables, Data Types, and Operators
Control Flow (if statements, loops)
Functions and Modules
- Intermediate:
Lists, Dictionaries, and Sets
File Handling
Exception Handling
- Advanced:
Object-Oriented Programming (OOP)
Decorators and Generators
Virtual Environments

Programming languages

  • Python
  • R
  • shell

Recommendation books and websites

I would highly recommend to read "Computing Skills for Biologists A Toolbox"

2 Important envirnment management Softwares

###---Main goal---###
- learn to use programming supportive platform
- learn to manage environments and packages
- learn to version control

2-1 programming packages management Software

comments: anaconda is useful tool for  python packages and enviroments management

2-2 Integrated development environment(IDE) Software

comments: pycharm is a powerful IDE for different programming languages (support Python, R, etc.). It is used for code writing, testing and debugging.

2-3 Version control

comments: verson control is extremely important since it will track all changes you made to the files

3 Common packages

###---Main goal---###
understand the main functions of each package

3-1 python packages

comments: 
#1 Biopython is super helpful package for biological application, so I highly recommend go throughly its tutorial
#2 I also provide some cheatsheets on the text_book file for assisting programming

3-2 R packages

4 Applications

###---Main goal---###
run and test some toy experiments

4-1 Bioinformatics

4-1-1 Deep Mutational Scanning analyses

4-1-2 NGS analyses

4-2 Machine learning

###---Main goal---###
- Fundamentals:
Supervised vs. Unsupervised Learning
Types of Machine Learning Algorithms (self-supervised, generative,etc)
Training and Testing Data
- Common Models:
Transformer
GPT
BERT
GNN
Difussion
- Evaluation and Optimization:
Metrics (Accuracy, Precision, Recall)
Wanb (Visualization)
Hyperparameter Tuning

4-3 Web development

###---Main goal---###
- goals:
Build home website for running deep learning model on the server (AWS EC2)
1. Set Up an AWS EC2 Instance
2. Install Necessary Software (conda)
3. Develop the Web Application (Flask)
4. overview
flask --> gunicorn <--> nginx <--> requests

5 QUICK REFERENCES

ssh connection

If you've never run ssh before, you will need to create a .ssh directory. Run

mkdir -p ~/.ssh && chmod 700 ~/.ssh

Create config inside .ssh directory by

touch ~/.ssh/config
chmod 600 ~/.ssh/config

Add server info into config

Host wulab
    HostName nicwulab-linux.life.illinois.edu
    User id
    Port 22

To log in to the server, in the terminal, run

ssh wulab

move files

To copy from your computer to a (remote) server. Run(Change the path accordingly)

scp ~/local/path [email protected]:/home/server/path

or download files from server. Run

scp -r [email protected]:/home/server/path ~/local/path

install miniconda

Download miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install by

bash Miniconda3-latest-Linux-x86_64.sh

Create new env by

conda create --name Env_name python=3.9
conda activate Env_name

Remove env by

conda remove --name myenv --all

Create new env using yml by

conda env create -f environment.yml

Save conda env as yml by

conda activate my_env
conda env export > path/to/environment.yml

git

git init and connect to github repo by

git init 
git add README.md
git commit -m "README.md"
git branch -M main
git remote add origin <repository-url>

git remove files by

git rm --cached file.csv
git commit -m "Removed files"

git push -u <remote> <branch>

fetch and merge from github by

git fetch <remote>
git merge <remote>/<branch>

jupyter lab/notebook

Connect to server and initialize juypter lab by

ssh wulab
jupyter notebook --no-browser --port=8080
jupyter lab --no-browser --port=8080

Open another local terminal and connect it by

ssh -N -L 8080:localhost:8080 [email protected]

Copy the Jupyter lab URL that appears, and paste it into your web browser.

setup service for gunicorn

sudo nano /etc/systemd/system/helloworld.service

Then add this into the file.

[Unit]
Description=Gunicorn instance for a simple hello world app
After=network.target
[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/helloworld
ExecStart=/home/ubuntu/helloworld/venv/bin/gunicorn -b localhost:8000 app:app
Restart=always
[Install]
WantedBy=multi-user.target

Then enable the service:

sudo systemctl daemon-reload
sudo systemctl start helloworld
sudo systemctl enable helloworld

Nginx Webserver route request to gunicorn

Install Nginx

sudo apt-get nginx

Start the Nginx service and go to the Public IP address of your EC2 on the browser to see the default nginx landing page

sudo systemctl start nginx
sudo systemctl enable nginx

Edit the default file in the sites-available folder.

sudo nano /etc/nginx/sites-available/default

Add the following code at the top of the file (below the default comments)

upstream flaskhelloworld {
    server 127.0.0.1:8000;
}

Add a proxy_pass to flaskhelloworld atlocation /

location / {
    proxy_pass http://flaskhelloworld;
}

Restart Nginx

sudo systemctl restart nginx

6 Tools

coming soon ~

local igblast

Somatic Hypermutation(SHM) calling

CDRH3 clustering

Protein clustering

PDB renumbering

NGS data processing-1 (DMS)

NGS data processing-2 (Genome)

Language modeling

Transfer learning

About

I create this repo to share how to learn programming from scratch for biology student

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published