Skip to content

Commit 5fd8420

Browse files
committed
CUDA_ARCH Makefile macro, zlib enabled by default, improved documentation
1 parent c3b3879 commit 5fd8420

16 files changed

+162
-85
lines changed

Makefile

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,22 @@ DIALECT = -std=c++14
1212
WARNINGS = -Wall -Wextra -Wpedantic
1313
NVCC_WARNINGS = -Xcompiler="-Wall -Wextra"
1414
OPTIMIZATION = -O3
15-
INCLUDES = -lz
16-
#-march native -fomit-frame-pointer
17-
# CUB = -I<path-to-cub>
18-
NVCC_FLAGS = $(CUB) -arch=sm_70 -lineinfo --expt-relaxed-constexpr --extended-lambda
15+
INCLUDE =
1916

20-
CXXFLAGS = $(INCLUDES) $(MACROS) $(DIALECT) $(WARNINGS)
17+
NVCC_FLAGS = $(CUB) -arch=$(CUDA_ARCH) -lineinfo --expt-relaxed-constexpr --extended-lambda
18+
CXXFLAGS = $(INCLUDE) $(MACROS) $(DIALECT) $(WARNINGS)
2119

22-
CUDA_FLAGS = $(NVCC_FLAGS) $(INCLUDES) $(MACROS) $(DIALECT) $(NVCC_WARNINGS)
20+
LDFLAGS = -pthread
2321

24-
LDFLAGS = -pthread $(INCLUDES)
22+
CUDA_FLAGS = $(NVCC_FLAGS) $(INCLUDE) $(MACROS) $(DIALECT) $(NVCC_WARNINGS)
23+
CUDA_LDFLAGS = $(NVCC_FLAGS) -Xcompiler="-pthread"
2524

26-
CUDA_LDFLAGS = $(NVCC_FLAGS) $(INCLUDES) -Xcompiler="-pthread"
25+
# if MC_ZLIB=NO => deactivate zlib support
26+
ifeq ($(MC_ZLIB),NO)
27+
LDFLAGS += -lz
28+
CUDA_LDFLAGS += -lz
29+
MACROS += -DMC_NO_ZLIB
30+
endif
2731

2832

2933
#--------------------------------------------------------------------
@@ -125,8 +129,6 @@ CUDA_COMPILE = $(CUDA_COMPILER) $(CUDA_FLAGS) -c $< -o $@
125129
#--------------------------------------------------------------------
126130
# main targets
127131
#--------------------------------------------------------------------
128-
.PHONY: all clean
129-
130132
release:
131133
$(MAKE) release_dummy DIR=$(REL_DIR) ARTIFACT=$(REL_ARTIFACT) MACROS=$(MACROS)
132134

@@ -146,7 +148,6 @@ release_dummy: $(REL_DIR) $(REL_ARTIFACT)
146148
debug_dummy: $(DBG_DIR) $(DBG_ARTIFACT)
147149
profile_dummy: $(PRF_DIR) $(PRF_ARTIFACT)
148150

149-
150151
gpu_release:
151152
$(MAKE) gpu_release_dummy DIR=$(REL_CUDA_DIR) CUDA_ARTIFACT=$(REL_CUDA_ARTIFACT) MACROS="$(MACROS) -DGPU_MODE"
152153

@@ -173,7 +174,11 @@ gpu_debug_dummy: $(DBG_CUDA_DIR) $(DBG_CUDA_ARTIFACT)
173174
gpu_profile_dummy: $(PRF_CUDA_DIR) $(PRF_CUDA_ARTIFACT)
174175

175176

177+
# phony targets
178+
.PHONY: all clean gpu cpu
176179
all: release debug profile
180+
cpu: release
181+
gpu: gpu_release
177182

178183
clean :
179184
rm -rf build_*

README.md

Lines changed: 56 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,29 @@ MetaCache is a classification system for mapping genomic sequences (short reads,
66

77
For an independend comparison to other tools in terms of classification accuracy see the [LEMMI](https://lemmi.ezlab.org) benchmarking site.
88

9-
MetaCache's CPU version classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs.
9+
**MetaCache's CPU** version classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs.
1010

11-
MetaCache's [GPU version](docs/gpu_version.md) classifies around 300 Million reads (of length 100) per minute against all complete bacterial, viral, fungal and archaea genomes from NCBI RefSeq Release 202 running on a workstation with 4 NVIDIA(R) Tesla(R) V100 GPUs (32 GB model).
11+
**MetaCache's [GPU version](docs/gpu_version.md)** classifies around 300 Million reads (of length 100) per minute against all complete bacterial, viral, fungal and archaea genomes from NCBI RefSeq Release 202 running on a workstation with 4 NVIDIA(R) Tesla(R) V100 GPUs (32 GB model).
1212

1313

1414

1515

1616
## Quick Start with NCBI RefSeq
17-
This will download MetaCache, compile it, download the complete bacterial, viral and archaea genomes from the latest NCBI RefSeq release (this can take some time) and build a classification database from them:
17+
on a Debian/Ubuntu system:
1818

1919
```
20+
sudo apt install -y zlib1g zlib1g-dev
2021
git clone https://github.com/muellan/metacache.git
2122
cd metacache
2223
make
2324
./metacache-build-refseq
2425
```
26+
This will
27+
* install the zlib library
28+
* download the MetaCache source code from GitHub
29+
* compile MetaCache (without GPU support)
30+
* download the complete bacterial, viral and archaea genomes from the latest NCBI RefSeq release (this can take some time)
31+
* build a classification database
2532

2633
Once the default database is built you can classify reads:
2734
```
@@ -36,56 +43,69 @@ Once the default database is built you can classify reads:
3643

3744
## Detailed Installation Instructions
3845

39-
#### Requirements
40-
MetaCache itself should compile on any platform for which a C++14 conforming compiler is available. The Makefile is written with g++ or clang++ in mind, but could probably be adapted to MSVC or other compilers.
46+
Visit MetaCache's github [repository] to get the latest resources.
47+
48+
* To compile the CPU version: run `make` in the directory containing the Makefile
49+
* To compile the GPU version, follow the instructions provided [here](docs/gpu_version.md).
50+
51+
52+
### CPU Version Requirements
53+
54+
MetaCache itself should compile on any platform for which a C++14 conforming compiler is available. The Makefile is written with g++ or clang++ in mind, but could probably be adapted to (a very recent version of) MSVC or other compilers.
4155

4256
The helper scripts (for downloading genomes, taxonomy etc.) require the Bash shell to run. That means you need a working bash executable as well as some common GNU utilities like "awk" and "wget". On Windows you should use the 'Windows Subsystem for Linux' (which gives you an Ubuntu user mode talking to the Windows Kernel).
4357

44-
There are no dependencies on third party libraries.
45-
MetaCache was successfully tested on the following platforms (all 64 bit + 64 bit compilers):
46-
- Ubuntu 14.04 with g++ 5.4
47-
- Ubuntu 16.04 with g++ 5.3, g++ 7.2
48-
- Ubuntu 18.04 with g++ 5.4, g++ 7.4
49-
- Windows 10 Build 1709 64bit with MinGW-w64 g++ 7.2
50-
- Windows 10 Build 1909 64bit running Ubuntu 16.04 inside WSL and g++ 7.2
58+
MetaCache 2.0.0 was successfully tested on the following platforms (all 64 bit + 64 bit compilers):
59+
- Ubuntu 20.04 with g++ 5.4, g++ 7.4
60+
- Windows 10 20H2 running Ubuntu 20.04 inside WSL2 and g++ 10.3
5161

5262
In order to be able to build the default database (based on NCBI RefSeq Release 97) with default settings your system should have around 64GB of RAM (note that the NCBI RefSeq will still be growing in the near future).
5363
If you don't have enough RAM, you can use [database partitioning](docs/partitioning.md).
5464

55-
#### Get The Latest Sources
56-
Visit MetaCache's github [repository].
5765

66+
### GPU Version Requirements
67+
The GPU version requires a CUDA-capable device of the Pascal generation or newer and either CUDA >= 11 or CUDA 10.2 and a self-provided version of [CUB](https://github.com/NVlabs/cub).
5868

59-
#### Compile
60-
Run 'make' in the directory containing the Makefile.
61-
This will compile MetaCache with the default data type settings which support databases with up to 65,535 reference sequences (targets) and k-mer sizes up to 16. This offers a good database space efficiency and is currently sufficient for the complete bacterial, viral and archaea genomes from the NCBI RefSeq.
69+
See [here](docs/gpu_version.md) for more.
6270

63-
If you want MetaCache to be able to process gzipped files make sure you have the zlib library installed on your system and compile with:
6471

72+
### Library Requirements (CPU & GPU versions)
73+
MetaCache requires the zlib compression library to be installed on your system in order to be able to process gzipped FASTA/FASTQ files.
74+
On Debian/Ubuntu zlib can be installed with
6575
```
66-
make MACROS="-DMC_ZLIB"
76+
sudo apt install -y zlib1g zlib1g-dev
6777
```
78+
If you *don't* have zlib installed or cannot do so you can compile with:
79+
```
80+
make MC_ZLIB=NO
81+
```
82+
which will remove the zlib dependency and disables support for gzipped input files.
83+
6884

69-
Using the following compilation options you can compile MetaCache with support for more reference sequences and greater k-mer lengths.
85+
### Custom Configurations
7086

71-
##### number of referece sequences (targets)
87+
If you run 'make' without additional parameters MetaCache will be compiled with the default data type settings which support databases with up to 65,535 reference sequences (targets) and k-mer sizes up to 16. This offers a good database space efficiency and is currently sufficient for the complete bacterial, viral and archaea genomes from the NCBI RefSeq.
7288

73-
* support for up to 65,535 reference sequences (default):
89+
Using the following compilation options you can compile MetaCache with support for more targets and greater k-mer lengths.
90+
91+
#### number of referece sequences (targets)
92+
93+
* support for up to 65,535 targets (default):
7494
```
7595
make MACROS="-DMC_TARGET_ID_TYPE=uint16_t"
7696
```
7797

78-
* support for up to 4,294,967,295 reference sequences (needs more memory):
98+
* support for up to 4,294,967,295 targets (needs more memory):
7999
```
80100
make MACROS="-DMC_TARGET_ID_TYPE=uint32_t"
81101
```
82102

83-
* support for more than 4,294,967,295 reference sequences (needs even more memory)
103+
* support for more than 4,294,967,295 targets (needs even more memory)
84104
```
85105
make MACROS="-DMC_TARGET_ID_TYPE=uint64_t"
86106
```
87107

88-
##### reference sequence lenghts
108+
#### reference sequence lenghts
89109
* support for targets up to a length of 4,294,967,295 windows (default)
90110
with default settings (window length, k-mer size) no sequence length must exceed 485.3 billion nucleotides
91111
```
@@ -98,8 +118,7 @@ Using the following compilation options you can compile MetaCache with support f
98118
make MACROS="-DMC_WINDOW_ID_TYPE=uint16_t"
99119
```
100120

101-
102-
##### kmer lengths
121+
#### kmer lengths
103122
* support for kmer lengths up to 16 (default):
104123
```
105124
make MACROS="-DMC_KMER_TYPE=uint32_t"
@@ -112,14 +131,21 @@ Using the following compilation options you can compile MetaCache with support f
112131

113132
You can of course combine these options (don't forget the surrounding quotes):
114133
```
115-
make MACROS="-DMC_ZLIB -DMC_TARGET_ID_TYPE=uint32_t -DMC_WINDOW_ID_TYPE=uint32_t"
134+
make MACROS="-DMC_TARGET_ID_TYPE=uint32_t -DMC_WINDOW_ID_TYPE=uint32_t"
116135
```
117136

118137
**Note that a database can only be queried with the same variant of MetaCache (regarding data type sizes) that it was built with.**
119138

120139
In rare cases databases built on one platform might not work with MetaCache on other platforms due to bit-endianness and data type width differences. Especially mixing MetaCache executables compiled with 32-bit and 64-bit compilers might be probelematic.
121140

122141

142+
#### disabling zlib support
143+
144+
If you *don't* have the zlib compression library installed and/or want *don't* want gzipped input file support you can compile with:
145+
```
146+
make MC_ZLIB=NO
147+
```
148+
123149

124150

125151
## Building Databases
@@ -160,8 +186,9 @@ Once a database (e.g. the standard 'refseq'), is built you can classify reads.
160186

161187
## Documentation of Command Line Parameters
162188

163-
* [for mode `build`](docs/mode_build.txt): build database from reference genomes
189+
* [for mode `build`](docs/mode_build.txt): build database from reference genomes (and write it to disk)
164190
* [for mode `query`](docs/mode_query.txt): query reads against database
191+
* [for mode `build+query`](docs/mode_build_query.txt): build reference database and immediately query reads (mainly recommended for GPU version)
165192
* [for mode `merge`](docs/mode_merge.txt): merge results of independent queries
166193
* [for mode `modify`](docs/mode_modify.txt): add reference genomes to database or update taxonomy
167194
* [for mode `info`](docs/mode_info.txt): obtain information about a database

dep/hpc_helpers/LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2020 Parallel and Distributed Architectures
3+
Copyright (c) 2021 Parallel and Distributed Architectures
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

docs/gpu_version.md

Lines changed: 62 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,101 @@
1+
12
# MetaCache-GPU
23

3-
## Installation Instructions
44

5-
#### Requirements
5+
## Example Installation
6+
on an Ubuntu system with NVIDIA Quattro GV100 GPUs and CUDA SDK version 11 installed:
7+
```
8+
sudo apt install -y zlib1g zlib1g-dev
9+
git clone https://github.com/muellan/metacache.git
10+
cd metacache
11+
git submodule update --init --recursive
12+
make gpu CUDA_ARCH=sm_70
13+
```
14+
See below for more details.
615

7-
The GPU version of MetaCache requires a CUDA-capable device of the Pascal generation or newer and either:
816

9-
* CUDA >= 11
10-
* CUDA 10.2 and a self-provided version of [CUB](https://github.com/NVlabs/cub)
1117

12-
Make sure to adjust the Makefile to the GPU generation you want to use by setting the `-arch` flag (e.g. `-arch=sm_70` for Quadro GV100). You also have to set the include path for CUB if your CUDA version is below CUDA 11.
18+
## Requirements
1319

14-
MetaCache-GPU depends on the hashtable implementation of [warpcore](https://github.com/sleeepyjack/warpcore) and the sorting algorithm [bb_segsort](https://github.com/Funatiq/bb_segsort). Both repositories are included as submodules and need to be checked out in addition to MetaCache itself. You can do so be calling
20+
### Hardware Requirements
1521

16-
```git submodule update --init --recursive```
22+
The GPU version of MetaCache requires a CUDA-capable device of the Pascal generation or newer.
1723

18-
In order to be able to build the default database (based on NCBI RefSeq Release 97) with default settings your system will need a total of 120 GB of GPU memory (e.g. 4x GPUs with 32 GB each).
24+
In order to be able to build the default database (based on NCBI RefSeq Release 97) with default settings your system will need a total of 120 GB of GPU memory (e.g. 4x GPUs with 32 GB each).
1925
If you don't have enough GPU memory, you can use [database partitioning](docs/partitioning.md).
2026

21-
#### Compile
22-
Run '`make gpu_release`' in the directory containing the Makefile.
23-
This will compile MetaCache-GPU with support for:
2427

25-
* up to 4,294,967,295 reference sequences
26-
* targets up to a length of 4,294,967,295 windows
27-
* kmer lengths up to 16
28+
### Software Dependencies
29+
30+
* CUDA SDK
31+
* CUDA >= 11
32+
* CUDA 10.2 and a self-provided version of [CUB](https://github.com/NVlabs/cub) (you also need to set the include path for CUB by supplying `INCLUDE=*your_cub_path*` when calling make)
33+
34+
* Hashtable library [warpcore](https://github.com/sleeepyjack/warpcore) and sorting library [bb_segsort](https://github.com/Funatiq/bb_segsort). Both repositories are included as submodules and need to be checked out in addition to MetaCache itself. You can do so by calling
35+
```git submodule update --init --recursive```
36+
37+
* Support for gzipped FASTA/FASTQ files requires the zlib compression library to be installed on your system.
38+
On Debian/Ubuntu zlib can be installed with
39+
`sudo apt install -y zlib1g zlib1g-dev`. If you *don't* have zlib installed or cannot do so you can compile with `make MC_ZLIB=NO`
40+
which will remove the zlib dependency and disables support for gzipped input files.
41+
42+
43+
## Installation / Compiling
44+
45+
Run `make` in the directory containing the Makefile and set the GPU generation with the `CUDA_ARCH` flag (e.g. `CUDA_ARCH=sm_70` for Quadro GV100):
46+
```
47+
make gpu CUDA_ARCH=sm_70
48+
```
49+
50+
If you don't supply additional parameters MetaCache will be compiled with the default data type settings which support databases with
51+
52+
* up to 4,294,967,295 targets (= reference sequences)
53+
* targets with a length of up to 4,294,967,295 windows (which corresponds to approximately 485.3 billion nucleotides with the default window size of 112)
54+
* kmers with a lengths of up to 16
2855

2956
This corresponds to the CPU version compiled with `make MACROS="-DMC_TARGET_ID_TYPE=uint32_t"`
3057

31-
**Note that a database build by the GPU version can be queried by the corresponding CPU version and vice versa. The only restriction is the available (GPU) memory.**
58+
**A database built by the GPU version can be queried by the corresponding CPU version and vice versa. The only restriction is the available (GPU) memory.**
59+
3260

3361

3462
## Differences to CPU version
3563

3664
MetaCache-GPU allows to **build** distributed databases across multiple GPUs.
37-
In difference to the [database partitioning](docs/partitioning.md) approach, the program distributes the reference genomes automatically across the GPUs in a single run. Due to the dynamic distribution scheme and the concurrent execution on the GPUs, two database builds for the same input files will most likely differ. However, this should have only a small impact on classification performance.
65+
In difference to the [database partitioning](docs/partitioning.md) approach, the reference genomes are automatically distributed across multiple GPUs in a single run. Due to the dynamic distribution scheme and the concurrent execution on the GPUs, two database builds for the same input files will most likely differ. However, this should only have a negligible impact on classification performance.
66+
67+
In order to **query** a multi-GPU database make sure to set the same number of GPUs when using the query mode.
68+
69+
### Build+Query Immediate Mode
70+
Since building databases is significantly faster on the GPU than on the CPU and will often take less than a minute, the [build+query mode](docs/mode_build_query.txt) can be used to build and directly query a database without writing the database to disk.
3871

39-
In order to **query** a multi-GPU database make sure to set the same number of GPUs when using the query mode. Note, that only a small number of threads is needed to saturate the GPU query pipeline.
4072

41-
#### Command Line Options
73+
### Command Line Options
4274

4375
The command line options of the GPU version are similar to the CPU version with a few notable exceptions:
4476

45-
##### mode build
77+
#### mode build
4678

4779
* `-parts <#>` sets the number of GPUs to use (default: all available GPUs).
4880

49-
##### mode query
81+
#### mode query
5082

5183
* `-replicate <#>` enables multiple GPU pipelines (default: 1). Each pipeline occupies one GPU per database part.
5284

53-
##### mode build & mode query
85+
#### mode build & mode query
5486

5587
* `-kmerlen` kmer length is limited to 16 (default: 16).
5688
* `-sketchlen` sketch length is limited to 16 (default: 16).
5789
* `-winlen` window length is limited to 127 (default: 127).
58-
* `-winstride` window stride has to be multiple of 4 (default: 112).
59-
* `-remove-overpopulated-features` is not supported.
60-
* `-remove-ambig-features` is not supported.
90+
* `-winstride` window stride has to be a multiple of 4 (default: 112).
91+
* `-remove-overpopulated-features` is *not* supported.
92+
* `-remove-ambig-features` is *not* supported.
6193

62-
##### mode info
94+
#### mode info
6395

64-
* feature map is not available.
65-
* feature counts are not available.
96+
* submode `locations`is *not* available.
97+
* submode `featurecounts` is *not* available.
6698

67-
##### mode merge
99+
#### mode merge
68100

69-
* merging on GPU is not available and will fall back to CPU version.
101+
Merging multiple result files will *not* be performed on the GPU and will fall back to the CPU.

docs/mode_build.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,18 +93,21 @@ ADVANCED OPTIONS
9393
family, suborder, order, subclass, class, subphylum,
9494
phylum, subkingdom, kingdom, domain
9595
default: off
96+
Not available in the GPU version.
9697

9798
-max-ambig-per-feature <#>
9899
Maximum number of allowed different reference sequence
99100
taxa per feature if option '-remove-ambig-features' is
100101
used.
102+
Not available in the GPU version.
101103

102104
-max-load-fac <factor>
103105
maximum hash table load factor;
104106
This can be used to trade off larger memory consumption
105107
for speed and vice versa. A lower load factor will improve
106108
speed, a larger one will improve memory efficiency.
107109
default: 0.800000
110+
Not available in the GPU version.
108111

109112
-parts <#> Splits the database into multiple parts. Each part
110113
contains a separate hash table.

0 commit comments

Comments
 (0)