Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MMseqs2 clustering and taxonomy #6574

Open
wants to merge 63 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
6467853
Init mmseqs2
clsiguret Oct 15, 2024
77d1bef
init DM
hugolefeuvre Oct 15, 2024
3ad2ea4
continue DM
hugolefeuvre Oct 16, 2024
b9da697
Split to TOOL_VERSION and COMMIT
clsiguret Oct 16, 2024
0ac3019
Modify macros and json output
hugolefeuvre Oct 16, 2024
75f154d
update macro
clsiguret Oct 16, 2024
fc600b6
init mmseqs2_taxonomy
clsiguret Oct 16, 2024
403a2d9
init mmseqs2_createtaxdb
clsiguret Oct 16, 2024
6c308bb
Change name and description
clsiguret Oct 16, 2024
afd1577
init mmseqs2_createdb
clsiguret Oct 16, 2024
ec54759
init mmseqs2_createtsv
clsiguret Oct 16, 2024
d48ab99
init mmseqs2_createtsv
clsiguret Oct 16, 2024
71779aa
Merge branch 'mmseqs2' of github.com:clsiguret/tools-iuc into mmseqs2
clsiguret Oct 16, 2024
5f584df
continue DM
hugolefeuvre Oct 16, 2024
0c250c5
init taxonomyreport
clsiguret Oct 17, 2024
f96947c
add test files for createtsv
clsiguret Oct 17, 2024
09771c7
Add second test with other data table
hugolefeuvre Oct 17, 2024
7f2c49e
add double quote
hugolefeuvre Oct 17, 2024
88da237
start create_db
hugolefeuvre Oct 17, 2024
157360f
continue mmseqs2 DM (macros modification)
hugolefeuvre Oct 18, 2024
410d4f4
put all xml into one
clsiguret Oct 18, 2024
3c3d188
put all xml into one
clsiguret Oct 18, 2024
8fbf055
Merge branch 'galaxyproject:main' into mmseqs2
hugolefeuvre Oct 18, 2024
b82acf0
add createdb section
hugolefeuvre Oct 18, 2024
0f859b5
add createtaxdb and filtertaxseqdb sections
clsiguret Oct 18, 2024
0bd7ef7
Update taxonomy assignement : taxonomy module prefilter options
hugolefeuvre Oct 18, 2024
e1299a9
taxonomy part : align parameters
hugolefeuvre Oct 21, 2024
76da478
taxonomy module : misc and common options
hugolefeuvre Oct 22, 2024
d0fac9f
all parameters into xml
hugolefeuvre Oct 23, 2024
279f796
finish wrapping command and start tests
hugolefeuvre Oct 23, 2024
cb26bf4
Change tool name
hugolefeuvre Oct 24, 2024
aafcdfe
Merge branch 'galaxyproject:main' into mmseqs2
hugolefeuvre Oct 30, 2024
2333a5a
Merge branch 'mmseqs2_DM' into mmseqs2
hugolefeuvre Oct 30, 2024
79f3410
add new loc.sample file and modification to pass tests
hugolefeuvre Oct 30, 2024
dd5ab01
issue with database : test dont select test database
hugolefeuvre Oct 31, 2024
cbf00be
group multiple conditionnal part
hugolefeuvre Nov 4, 2024
29e9015
start mmseqs2 easy-linclust wrapper
hugolefeuvre Nov 4, 2024
1d28b66
finish mmseqs2 linclust wrapping
hugolefeuvre Nov 7, 2024
53c1550
start easy-taxo wrapper, I want to compare taxonomy and easy-taxonomy
hugolefeuvre Nov 7, 2024
6649879
modify easy-taxonomy : conditionnal and resolve DB issue
hugolefeuvre Nov 8, 2024
8bac29c
Merge branch 'galaxyproject:main' into mmseqs2
hugolefeuvre Nov 8, 2024
784bb52
update taxo : issue with mmseqs, update with easy-taxo : issue with i…
hugolefeuvre Nov 12, 2024
5f3a7e3
filter kraken or krona output
hugolefeuvre Nov 14, 2024
98e72f3
start modify DM
hugolefeuvre Nov 15, 2024
eb4187d
modify DM path and json informations
hugolefeuvre Nov 15, 2024
1ba4a1f
wrong value
hugolefeuvre Nov 18, 2024
f46798e
start multiple datatable management
hugolefeuvre Nov 18, 2024
65c07c3
add nucleotide data table into param
hugolefeuvre Nov 18, 2024
3efd6ab
Reduced database, possibility of having the 2 types of report
hugolefeuvre Nov 19, 2024
78ba0c6
Merge branch 'galaxyproject:main' into mmseqs2
hugolefeuvre Nov 19, 2024
220cd5a
delete useless files and parameters + last tests
hugolefeuvre Nov 19, 2024
c729ba6
try to chmod Swiss-Prot_taxonomy because error could not open for wri…
hugolefeuvre Nov 19, 2024
072fc28
Create a symlink of the database to the job working directory
hugolefeuvre Nov 20, 2024
54a432f
try symlink with a directory
hugolefeuvre Nov 21, 2024
fe0b983
try with database cp and modify DM
hugolefeuvre Nov 21, 2024
3637da0
remove filtertaxseqdb conditionnal
hugolefeuvre Nov 21, 2024
6d07b87
few changes
hugolefeuvre Nov 21, 2024
940613d
macros parameters
hugolefeuvre Nov 22, 2024
09e0d94
Revert "macros parameters"
hugolefeuvre Nov 22, 2024
41cb56d
Revert "Revert "macros parameters""
hugolefeuvre Nov 25, 2024
2e2364d
modify alph_type conditionnal, remove createdb-mode parameter and TWI…
hugolefeuvre Nov 25, 2024
816451e
include the commit in the tool version, add .lint_skip file to skip T…
hugolefeuvre Dec 2, 2024
dde715a
few modifications on DM and tools wrapper
hugolefeuvre Dec 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions data_managers/data_manager_mmseqs2_database/.lint_skip
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ToolVersionPEP404
14 changes: 14 additions & 0 deletions data_managers/data_manager_mmseqs2_database/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: data_manager_mmseqs2_database
owner: iuc
description: "MMseqs2 is an ultra fast and sensitive sequence search and clustering suite"
homepage_url: "https://github.com/soedinglab/MMseqs2"
long_description: |
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.
MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows.
The software is designed to run on multiple cores and servers and exhibits very good scalability.
MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity.
It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.
remote_repository_url: "https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_mmseqs2_database"
type: unrestricted
categories:
- Data Managers
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
<tool id="data_manager_mmseqs2_download" name="Download MMseqs2 databases" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" tool_type="manage_data" profile="22.05">
<description></description>
<macros>
<token name="@TOOL_VERSION@">15.6f452</token>
<token name="@VERSION_SUFFIX@">0</token>
</macros>
<requirements>
<requirement type="package" version="@TOOL_VERSION@">mmseqs2</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
#set $database_name = str($database).split('/')[-1] if '/' in str($database) else str($database)
mkdir -p '$database_name' &&
mkdir -p '$out_file.extra_files_path' &&
mmseqs databases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I do not like about this is that most of the downloaded databases are not properly versioned. For instance GTDB is downloaded from latest: https://github.com/soedinglab/MMseqs2/blob/c2c3ad9c2956fac691d5a6041a9a4affa7fa27ad/data/workflow/databases.sh#L148

So we can not guarantee that gtdb on one Galaxy is the same as gtdb on another Galaxy.

That's is not your fault, but upstream. I would suggest to ask upstream if they could provide versioned downloads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to have a data manager that takes fasta as input and calls mmseqs createdb like hereL https://github.com/soedinglab/MMseqs2/blob/c2c3ad9c2956fac691d5a6041a9a4affa7fa27ad/data/workflow/databases.sh#L388

fasta could be taken from other data tables .. but its difficult, because it will be multiple data bases.

Or is the fasta removed before its added to the data table, they call rmdb....

Main question for me is: what is actually stored in the output folder and how big is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with you about db versions, do I need to create an issue on MMseqs repo to ask them ?

Maybe it would be better to have a data manager that takes fasta as input and calls mmseqs createdb
...fasta could be taken from other data tables .. but its difficult, because it will be multiple data bases.

If I understand correctly, the idea would be to take the fasta files already existing in Galaxy from the databases proposed by mmseqs and use createdb to create a database that can be used in the command suite without having to download the mmseqs databases. I wonder how easy it is to find out which fasta files are used to build mmseqs databases ?

Main question for me is: what is actually stored in the output folder and how big is it?

The output folder of createdb command has the same composition as after a mmseqs databases (you can find an example in Swiss-prot directory in test files). There is a text file with sequences representing the database (not a fasta format), index files and files containing general information (lookup file, identifiers assigned by MMseqs2 and correspondance with original sequences).
For test file which is 568K, the createdb output directory is 836K (don't know if it can be useful such a small file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I need to create an issue on MMseqs repo to ask them ?

This would be perfect.

idea would be to take the fasta files already existing in Galaxy from the databases

This would be my idea.

I wonder how easy it is to find out which fasta files are used to build mmseqs databases

This would be good anyway, otherwise we can not answer this question to users.

'$database' '$database_name'/database
hugolefeuvre marked this conversation as resolved.
Show resolved Hide resolved
'tmp'
--threads "\${GALAXY_SLOTS:-1}" &&
mv ${database_name} '$out_file.extra_files_path' &&
cp '$dmjson' '$out_file'
]]></command>
<configfiles>
<configfile name="dmjson"><![CDATA[
#from datetime import date
#set $database_name = str($database).split('/')[-1] if '/' in str($database) else str($database)
{
"data_tables":{
"$db_name.type":[
{
"value": "${database}-@TOOL_VERSION@-#echo date.today().strftime('%d%m%Y')#",
"name": "${database} #echo date.today().strftime('%d%m%Y')#",
"path": "$database_name",
"version": "@TOOL_VERSION@"
}
]
}
}]]>
</configfile>
</configfiles>
<inputs>
<conditional name="db_name">
<param argument="type" type="select" label="Type of Databases">
<option value="mmseqs2_aminoacid_databases" selected="true">Aminoacid databases</option>
<option value="mmseqs2_aminoacid_taxonomy_databases">Aminoacid databases that can be used for taxonomy</option>
<option value="mmseqs2_nucleotide_databases">Nucleotide databases</option>
<option value="mmseqs2_nucleotide_taxonomy_databases">Nucleotide databases that can be used for taxonomy</option>
<option value="mmseqs2_profile_databases">Profile databases</option>
</param>
<when value="mmseqs2_aminoacid_databases">
<param name="database" type="select" label="MMseqs2 aminoacid databases">
<option value="UniRef100" selected="true">UniRef100</option>
<option value="UniRef90">UniRef90</option>
<option value="UniRef50">UniRef50</option>
<option value="UniProtKB">UniProtKB</option>
<option value="UniProtKB/TrEMBL">TrEMBL (UniProtKB)</option>
<option value="UniProtKB/Swiss-Prot">Swiss-Prot (UniProtKB)</option>
<option value="NR">NR (Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq)</option>
<option value="GTDB">GTDB (Genome Taxonomy Database)</option>
<option value="PDB">PDB (The Protein Data Bank)</option>
</param>
</when>
<when value="mmseqs2_aminoacid_taxonomy_databases">
<param name="database" type="select" label="MMseqs2 aminoacid databases that can be used for taxonomy">
<option value="UniRef100" selected="true">UniRef100</option>
<option value="UniRef90">UniRef90</option>
<option value="UniRef50">UniRef50</option>
<option value="UniProtKB">UniProtKB</option>
<option value="UniProtKB/TrEMBL">TrEMBL (UniProtKB)</option>
<option value="UniProtKB/Swiss-Prot">Swiss-Prot (UniProtKB)</option>
<option value="NR">NR (Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq)</option>
<option value="GTDB">GTDB (Genome Taxonomy Database)</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these databases that are downloaded (fasta, some index, or something else)? For instance: is gtdb the full gtdb?

  • If so we have other datatables that manage these, given the huge size of of GTDB it would not make sense to duplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that gtdb mmseqs database is not the full gtdb. I'm downloading it with mmseqs databases and they tell me 66GB while full gtdb release220 is 107GB.
In addition, the output format will not be the same, as mmseqs databases gives a very particular form to its directory and has its own files (as explained with createdb in the previous comment: txt file with sequence representing the database, index files, general information files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So one thing that we should probably take case of (if we use fasta from other data tables) is that the mmseqs files are installed to a separate folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be in a separate folder, but I prefer to work that way as several files make up the output.
After checking, the GTDB database using mmseqs is finally 170GB, with the file containing the sequences being 100GB.

</param>
</when>
<when value="mmseqs2_nucleotide_databases">
<param name="database" type="select" label="MMseqs2 nucleotide databases">
<option value="SILVA">SILVA</option>
<option value="Kalamari">Kalamari</option>
<option value="NT">NT (Partially non-redundant nucleotide sequences from all traditional divisions of GenBank, EMBL, and DDBJ excluding GSS, STS, PAT, EST, HTG, and WGS)</option>
<option value="Resfinder">Resfinder</option>
</param>
</when>
<when value="mmseqs2_nucleotide_taxonomy_databases">
<param name="database" type="select" label="MMseqs2 nucleotide databases that can be used for taxonomy">
<option value="SILVA">SILVA</option>
<option value="Kalamari">Kalamari</option>
</param>
</when>
<when value="mmseqs2_profile_databases">
<param name="database" type="select" label="MMseqs2 profile databases">
<option value="PDB70">PDB70 (PDB clustered to 70% sequence identity)</option>
<option value="Pfam-A.full">Pfam-A.full</option>
<option value="Pfam-A.seed">Pfam-A.seed</option>
<option value="Pfam-B">Pfam-B</option>
<option value="CDD">CDD (Conserved Domain Database)</option>
<option value="VOGDB">VOGDB (Virus Orthologous Groups)</option>
<option value="dbCAN2">dbCAN2 (database of carbohydrate-active enzymes)</option>
</param>
</when>
</conditional>
</inputs>
<outputs>
<data name="out_file" format="data_manager_json" label="${tool.name}"/>
</outputs>
<tests>
<test expect_num_outputs="1">
<conditional name="db_name">
<param name="type" value="mmseqs2_nucleotide_taxonomy_databases" />
<param name="database" value="SILVA" />
</conditional>
<output name="out_file">
<assert_contents>
<has_text text='"mmseqs2_nucleotide_taxonomy_databases":'/>
<has_text text='"version": "15.6f452"'/>
<has_text_matching expression='"value": "SILVA-15.6f452-[0-9]{8}"'/>
<has_text_matching expression='"name": "SILVA [0-9]{8}"'/>
<has_text text='"path": "SILVA"'/>
</assert_contents>
</output>
</test>
<test expect_num_outputs="1">
<conditional name="db_name">
<param name="type" value="mmseqs2_aminoacid_taxonomy_databases" />
<param name="database" value="UniProtKB/Swiss-Prot" />
</conditional>
<output name="out_file">
<assert_contents>
<has_text text='"mmseqs2_aminoacid_taxonomy_databases":'/>
<has_text text='"version": "15.6f452"'/>
<has_text_matching expression='"value": "UniProtKB/Swiss-Prot-15.6f452-[0-9]{8}"'/>
<has_text_matching expression='"name": "UniProtKB/Swiss-Prot [0-9]{8}"'/>
<has_text text='"path": "Swiss-Prot"'/>
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
This tool downloads databases that can be used with MMseqs2.
]]></help>
<citations>
<citation type="doi">10.1038/nbt.3988</citation>
</citations>
</tool>
79 changes: 79 additions & 0 deletions data_managers/data_manager_mmseqs2_database/data_manager_conf.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<data_managers>
<data_manager tool_file="data_manager/data_manager_mmseqs2_download.xml" id="mmseqs2_download_databases">
<data_table name="mmseqs2_aminoacid_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file">
<move type="directory">
<source>${path}</source>
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">mmseqs2/${path}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/mmseqs2/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
<column name="version"/>
</output>
</data_table>
<data_table name="mmseqs2_aminoacid_taxonomy_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file">
<move type="directory">
<source>${path}</source>
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">mmseqs2/${path}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/mmseqs2/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
<column name="version"/>
</output>
</data_table>
<data_table name="mmseqs2_nucleotide_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file">
<move type="directory">
<source>${path}</source>
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">mmseqs2/${path}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/mmseqs2/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
<column name="version"/>
</output>
</data_table>
<data_table name="mmseqs2_nucleotide_taxonomy_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file">
<move type="directory">
<source>${path}</source>
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">mmseqs2/${path}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/mmseqs2/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
<column name="version"/>
</output>
</data_table>
<data_table name="mmseqs2_profile_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file">
<move type="directory">
<source>${path}</source>
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">mmseqs2/${path}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/mmseqs2/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
<column name="version"/>
</output>
</data_table>
</data_manager>
</data_managers>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
UniProtKB/Swiss-Prot-15.6f452-02122024 UniProtKB/Swiss-Prot 02122024 /tmp/tmphqvxgt7v/galaxy-dev/tool-data/mmseqs2/Swiss-Prot 15.6f452
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SILVA-15.6f452-02122024 SILVA 02122024 /tmp/tmphqvxgt7v/galaxy-dev/tool-data/mmseqs2/SILVA 15.6f452
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#This is a sample file distributed with Galaxy that enables tools
#to use a directory of metagenomics files.
#file has this format (white space characters are TAB characters)
#UniRef100-16102024 UniRef100 (MMseqs2) UniRef100.15.6f452 /path/to/data 15.6f452
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<tables>
<table name="mmseqs2_aminoacid_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="tool-data/mmseqs2_aminoacid_databases.loc"/>
</table>
<table name="mmseqs2_aminoacid_taxonomy_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="tool-data/mmseqs2_aminoacid_taxonomy_databases.loc"/>
</table>
<table name="mmseqs2_nucleotide_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="tool-data/mmseqs2_nucleotide_databases.loc"/>
</table>
<table name="mmseqs2_nucleotide_taxonomy_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="tool-data/mmseqs2_nucleotide_taxonomy_databases.loc"/>
</table>
<table name="mmseqs2_profile_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="tool-data/mmseqs2_profile_databases.loc"/>
</table>
</tables>
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<tables>
<table name="mmseqs2_aminoacid_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="${__HERE__}/test-data/mmseqs2_aminoacid_databases.loc.test"/>
</table>
<table name="mmseqs2_aminoacid_taxonomy_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="${__HERE__}/test-data/mmseqs2_aminoacid_taxonomy_databases.loc.test"/>
</table>
<table name="mmseqs2_nucleotide_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="${__HERE__}/test-data/mmseqs2_nucleotide_databases.loc.test"/>
</table>
<table name="mmseqs2_nucleotide_taxonomy_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="${__HERE__}/test-data/mmseqs2_nucleotide_taxonomy_databases.loc.test"/>
</table>
<table name="mmseqs2_profile_databases" comment_char="#">
<columns>value, name, path, version</columns>
<file path="${__HERE__}/test-data/mmseqs2_profile_databases.loc.test"/>
</table>
</tables>
1 change: 1 addition & 0 deletions tools/mmseqs2/.lint_skip
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ToolVersionPEP404
21 changes: 21 additions & 0 deletions tools/mmseqs2/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: mmseqs2
owner: iuc
description: MMseqs2 is an ultra fast and sensitive sequence search and clustering suite
long_description: |
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets.
MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows.
The software is designed to run on multiple cores and servers and exhibits very good scalability.
MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity.
It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.
categories:
- Sequence Analysis
- Metagenomics
homepage_url: https://github.com/soedinglab/MMseqs2
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/mmsesq2
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for the MMseqs2 tool suite: {{ tool_name }}"
suite:
name: "suite_mmseqs2"
description: "MMseqs2 is an ultra fast and sensitive sequence search and clustering suite"
Loading