-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data manager for clair3 models #6659
base: main
Are you sure you want to change the base?
Conversation
<requirement type="package" version="3.12">python</requirement> | ||
</requirements> | ||
<command detect_errors="exit_code"><![CDATA[ | ||
## this code looks up the existing table and uses it to build a list of known models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the known_models
code doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures that the same model is not downloaded twice (duplicates are not allowed in the data table).
@@ -0,0 +1,6 @@ | |||
<tables> | |||
<table name="clair3_models" comment_char="#" allow_duplicate_entries="False"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have assumed that Galaxy automatically does not allow duplicates if allow_duplicate_entries="False"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively the value_in_data_table
might be easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it lets you provide duplicate entries and then gives an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value_in_data_table
works for manually entered values, but doesn't work for the latest
mode - i.e. where the DM downloads the latest models.
In a discussion with Jack Jacobs of ONT this morning he confirmed that ONT would prefer some kind of wording associated with a checkbox that users can check to acknowledge that they comply with the terms of ONT's license for these models. He will revert with specific wording. The main concern from ONT is not a commercial user simply using the models but rather a commercial user using the models to compete with ONT - that said, I will wait on the wording. He also recommended a direct model download tool that can download the .tar.gz version of the models to the user history to ensure that they have the latest version of the models. This will be a separate project. Finally he confirmed that the model name uniquely identifies a model: i.e. there will not be "silent updates" of models. In conclusion, from this discussion I think:
|
Not sure. Why? |
Well, yes, lets wait for that, but as long as the model is supplied via a data manager any user, commercial or not, will not have direct access to it. |
As noted in #6547 the clair3 tool does not come with models for newer Oxford Nanopore flow cells (the R10 flow cells). These are provided by Oxford Nanopore on their Rerio page. This data manager fetches Clair3 models from that page.
NOTE: While this tool has no restrictions, the license on the Rerio page ("Oxford Nanopore Technologies, Ltd. Public License Version 1.0") is difficult to interpret and might not permit the use of the downloaded models in a commercial context. This is similar to 'research use only' tools like the BRAKER3 one (https://github.com/genouest/galaxy-tools/tree/master/tools/braker) and IUC folks need to decide on whether to include this DM in the repository, thus I have not checked the "License permits unrestricted use (educational + commercial)" box for now.
FOR CONTRIBUTOR: