Skip to content

01 Install PhylUp

Martha Kandziora edited this page May 2, 2021 · 17 revisions

To get started with PhylUp follow this installation guide.

1. download PhylUp using the command line:

  • as a normal package: wget https://github.com/mkandziora/PhylUp/archive/master.zip
  • as a git repository: git clone '[email protected]:mkandziora/PhylUp.git'

2. install PhylUp python requirements and dependencies:

run from within the PhylUp main folder:

  • python setup.py install
  • pip install -r requirements.txt

3. install the dependencies:

For a Linux machine, there is a bash file to install all external requirements automatically. Required software and databases are listed below, including a step-to-step guide for the installation of the databases. For the required software, check the respective website or the link above.

  • PaPaRa - alignment tool
  • RAxML-NG - tree estimation program
  • BLAST+ - It needs blast+ v.2.9 or higher. It is needed for filter runs and when using local BLAST databases. Setup and installation information can be found here.
  • EPA-NG - places new sequences into the phylogeny
  • gappa - transforms the output from EPA-NG into a readable output
  • MAFFT - alignment tool if a single sequence was provided as input.

Make sure the programs are accessible from everywhere, thus add them to your PATH using the command line:

  • UNIX: export PATH=$PATH:/path/to/my/program
  • Windows: set PATH=%PATH%;C:\path\to\my\program
  • MAC: export PATH=$PATH:~/path/to/program

4. install a local instance of the BLAST database:

General information about the BLAST database can be found here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html.

  • Install the blast database:

    On a Linux machine to install the BLAST database do the following:

    • open a terminal
    • cd /to/the/folder/of/your/future/blastdb
    • wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.* # this downloads all nt-compressed files
    • cat *.tar.gz | tar -xvzf - -i # macOS tar does not support the -i flag, you need to use homebrew to brew install gnu-tar and replace the tar command by gtar
    • blastdbcmd -db nt -info # checks if it works
    • rm *.tar.gz*

    'nt' means, we are making the nucleotide database.

  • Install the taxonomy database:

    install ncbi taxonomy database to retrieve taxon information from BLAST searches into the same directory as your blastdb from the step before.

    • cd /to/the/folder/of/your/blastdb
    • wget 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz' # Download the taxdb archive
    • gunzip -cd taxdb.tar.gz | (tar xvf - ) # Install it in the BLASTDB directory
    • rm *.tar.gz*
  • Install the taxonomic rank database:

    • cd /to/your/folder/of/PhylUp
    • wget 'ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
    • gunzip -cd taxdump.tar.gz | (tar xvf - names.dmp nodes.dmp)
    • move files into data/:
      • mv names.dmp data/
      • mv nodes.dmp data/
    • rm taxdump.tar.gz

  • Update the databases:

    The databases need to be updated regularly, the program will check the dates of your databases and will remember you to update the databases. If you set your analysis to run interactive and your databases are older, you will be asked for input, if you want to update them. Please note, that the interactive mode does not work on remote machines with scheduling systems, to stop the program from asking, change the following line in your analysis file from conf = ConfigObj(configfi, workdir) to conf = ConfigObj(configfi, workdir, interactive=False).

    They can be automatically updated:

    • run python ./update_databases.py
    • the blast database will only be updated if they are older than 90 days, the other databases will be updated independently of age.

    If you want to update the databases by hand:

    • blast db: update_blastdb.pl nt
    • taxonomy db: run update_blastdb.pl taxdb
    • rank db: repeat the steps listed under 'install the taxonomic rank database'

  • To test installation:
    run:

    • python3 ./tests/tests_setup.py
    • pytest tests/test_*

optional: install a virtual environment

This is very useful if you want to run it on a cluster and/or do not want to change already installed python packages on your computer. A virtual environment will locally install the packages needed.

  • pip install virtualenv
  • virtualenv -p python3 NameOfYourENV # you may need to just say python instead of python3, depending on your system

To use the virtual machine you need to activate it before doing anything else. This needs to be done before you start installing software in your virtual machine. Then the virtual machine needs to be activated before running Phylup.

source NameOfYourENV/bin/activate

and to deactivate it: deactivate