Skip to content

preparation wrapper script for IMPROVE #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ymahlich opened this issue Jan 16, 2025 · 4 comments · Fixed by #339
Closed

preparation wrapper script for IMPROVE #299

ymahlich opened this issue Jan 16, 2025 · 4 comments · Fixed by #339
Assignees
Labels
enhancement New feature or request

Comments

@ymahlich
Copy link
Collaborator

A wrapper script that makes use of the coderdata API to prepare the datasets for ingestion of the IMPROVE model testing pipeline needs to be created.
This will reside in the scripts folder of the repo.

@ymahlich ymahlich self-assigned this Jan 16, 2025
@ymahlich ymahlich added the enhancement New feature or request label Jan 16, 2025
@sgosline
Copy link
Member

From Natasha:

x_data_canc_files = [["cancer_gene_expression.tsv", ["Gene_Symbol"]]]
x_data_drug_files = [["drug_mordred.tsv"]]
y_data_files = [["response.tsv"]]
train_split_file = CCLE_split_0_train.txt
val_split_file = CCLE_split_0_val.txt
test_split_file = CCLE_split_0_test.txt

@ymahlich
Copy link
Collaborator Author

As per conversation with Natasha:

  • the ids in the files in splits/ reference the row number in data_y/response.tsv. This is truly a direct row number reference i.e. this "id" does not exist anywhere else
  • mapping to omics data is done internally by their preparation method by mapping from splitfile:row_number -> response.tsv:improve_sample_id/improve_chem_id -> individual entries in the corresponding omics datafiles extracted by looking up improve_sample_id & improve_chem_id

@sgosline
Copy link
Member

Is this complete?

@ymahlich
Copy link
Collaborator Author

I made some more fixes for Natasha the other day, but yes this is complete once the changes are merged into the main branch.
I'll create PR some time today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants