Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data manager for clair3 models #6659

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pvanheus
Copy link
Contributor

@pvanheus pvanheus commented Jan 7, 2025

As noted in #6547 the clair3 tool does not come with models for newer Oxford Nanopore flow cells (the R10 flow cells). These are provided by Oxford Nanopore on their Rerio page. This data manager fetches Clair3 models from that page.

NOTE: While this tool has no restrictions, the license on the Rerio page ("Oxford Nanopore Technologies, Ltd. Public License Version 1.0") is difficult to interpret and might not permit the use of the downloaded models in a commercial context. This is similar to 'research use only' tools like the BRAKER3 one (https://github.com/genouest/galaxy-tools/tree/master/tools/braker) and IUC folks need to decide on whether to include this DM in the repository, thus I have not checked the "License permits unrestricted use (educational + commercial)" box for now.

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

<requirement type="package" version="3.12">python</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
## this code looks up the existing table and uses it to build a list of known models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the known_models code doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ensures that the same model is not downloaded twice (duplicates are not allowed in the data table).

@@ -0,0 +1,6 @@
<tables>
<table name="clair3_models" comment_char="#" allow_duplicate_entries="False">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have assumed that Galaxy automatically does not allow duplicates if allow_duplicate_entries="False".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively the value_in_data_table might be easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it lets you provide duplicate entries and then gives an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_in_data_table works for manually entered values, but doesn't work for the latest mode - i.e. where the DM downloads the latest models.

@pvanheus
Copy link
Contributor Author

In a discussion with Jack Jacobs of ONT this morning he confirmed that ONT would prefer some kind of wording associated with a checkbox that users can check to acknowledge that they comply with the terms of ONT's license for these models. He will revert with specific wording. The main concern from ONT is not a commercial user simply using the models but rather a commercial user using the models to compete with ONT - that said, I will wait on the wording.

He also recommended a direct model download tool that can download the .tar.gz version of the models to the user history to ensure that they have the latest version of the models. This will be a separate project.

Finally he confirmed that the model name uniquely identifies a model: i.e. there will not be "silent updates" of models.

In conclusion, from this discussion I think:

  1. The wording is needed from ONT
  2. This tool will (like BRAKER3) likely need to find a home outside of tools-iuc.

@bernt-matthias
Copy link
Contributor

This tool will (like BRAKER3) likely need to find a home outside of tools-iuc.

Not sure. Why?

@wm75
Copy link
Contributor

wm75 commented Jan 16, 2025

The main concern from ONT is not a commercial user simply using the models but rather a commercial user using the models to compete with ONT - that said, I will wait on the wording.

Well, yes, lets wait for that, but as long as the model is supplied via a data manager any user, commercial or not, will not have direct access to it.
That would be different of course with the model download tool he suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants