-
Docker (see https://docs.docker.com/install/ if it is not already installed in your system)
-
64-bit Linux system operation. We validated MAGset with the following distributions:
- Ubuntu version 18.04 and 20.04
- CentOS version 7.6 and 8.3
- Fedora version 33
- Debian version 9 and 10 (In version 10, there is a open issue about docker and Debian, please follow these steps docker/for-linux#58 (comment) to use MAGset.)
-
About 12GB of free space (mostly for docker image)
- Download the main script
curl -OL https://github.com/LaboratorioBioinformatica/magset/releases/download/1.5.2/run-magset.sh
or
wget https://github.com/LaboratorioBioinformatica/magset/releases/download/1.5.2/run-magset.sh
- Make the script executable
chmod +x run-magset.sh
That's it! Please test your installation following the Quick start tutorial.
The execution time and memory usage will vary based on the data size, format file (GBFF or FASTA) and if the negative GRIs will be validated against the raw data (MAGcheck module).
Running the software with GBFF files will increase the memory/time considerably, because the pipeline with this type of file executes extra steps (pangenome and annotations).
In general, 8 GB of memory and 4 threads will be enough to execute comparisons with 4 bacterial genomes in a reasonable time.
The tables below show some examples of time/memory consumption, using Ubuntu 20.4 running in the cloud (digital ocean provider), Basic Plan (8 GB / 4 CPUs / 160 GB SSD Disk):
- Data:
- Genomes compared: 4 genomes of approximately 3MB each
- MAGcheck raw data: Illumina pair end, 50 GB (MAGcheck data)
FASTA without MAGcheck | FASTA with MAGcheck | GBK without MAGcheck | GBK with MAGcheck | |
---|---|---|---|---|
Time (hh:mm) | 00:05 | 00:45 | 01:00 | 01:45 |
Memory | 600MB | 600MB | 3.5 GB | 3.5 GB |
- Data:
- Genomes compared: 10 genomes of approximately 3MB each
- No raw data
FASTA without MAGcheck | GBK without MAGcheck | |
---|---|---|
Time (hh:mm) | 00:35 | 04:00 |
Memory | 850B | 4.5 GB |