Skip to content

Add dataset_id to tracks #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add dataset_id to tracks #36

wants to merge 3 commits into from

Conversation

TamaraNaboulsi
Copy link
Member

Description

A dataset_id field was added into the track model to include the specific dataset uuid that a track datafile is connected to. DB migration was also applied to dev with a default value for dataset_id. This migration should also be applied to staging and prod.

Review App URL(s)

http://add-dataset-id.review.ensembl.org

Knowledge Base

To apply the DB migration, the appropriate DB parameters should be defined, either added manually into environment args or exported using a file. An example of these parameters can be seen in the .env file. The following commands are then run to apply the migration.

python manage.py makemigrations
python manage.py migrate

Checklist

  • Black formatting
  • Tests

Copy link
Contributor

@veidenberg veidenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Notes:

  • DB migrate commands need to be applied to staging and prod after PR is merged.
  • Track API loading script needs an update to include dataset IDs in track submission payloads.
  • Current tracks have placeholder dataset IDs which can be updated later.
  • Ensembl-client needs to be updated to add dataset IDs to Track API requests

Copy link

@Mehrnaz-Charkhchi Mehrnaz-Charkhchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How TrackAPI is going to handle the track that GB needs to use for the latest release?
For example, Release 6 introduces new datasets for regulation tracks, while older versions of these regulation tracks (with previous datasets) already exist. After loading data into TrackAPI, we'll have tracks of the same type but with different dataset UUIDs.

How will TrackAPI determine which track version (dataset) should be used for the latest release?
Is this logic handled within TrackAPI, or will the FE manage the selection?

Copy link

@Mehrnaz-Charkhchi Mehrnaz-Charkhchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, Do we have tracks that won't have dataset UUIDs? One example that I can think of would be GC tracks, do we need to allow null for dataset uuids?

@veidenberg
Copy link
Contributor

@Mehrnaz-Charkhchi
Tracks are requested from Track API by Ensembl client, which should pass a dataset id together with genome id. That means track_categories endpoint needs to be updated to accept and filter with a dataset id. Note: If no dataset id is provided and a genome id has tracks with multiple dataset ids it currently returns all versions of the track. Not sure how to work around that.
Making dataset IDs nullable sounds better than the palceholder uuids (for tracks without dataset id).

Copy link
Contributor

@veidenberg veidenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some updates from the feedback:

  • Add dataset_id parameter to track_categories endpoint
  • Make dataset_id field nullable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants