Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to return plot given variable names, as well as data for said plot #134

Open
kbpi314 opened this issue Sep 6, 2023 · 5 comments
Open
Milestone

Comments

@kbpi314
Copy link
Collaborator

kbpi314 commented Sep 6, 2023

No description provided.

@kbpi314 kbpi314 added this to the 0.2.1 milestone Sep 6, 2023
@kbpi314
Copy link
Collaborator Author

kbpi314 commented Sep 7, 2023

Wanted to get input from @adamcantor22 and @cleme on the former point; I refrained from saving the plot as the variable strings because some of the variables are really long (e.g. with taxa), so they've been indexed by the integer of their indices. I was planning on making a script that would identify/return the plot path given input variable names; however, this is also complicated for variable names that are long taxa strings. Would you recommend some kind of search feature (e.g. input Paraprevotella and get all plots with that substring) or mandate the complete variable name (which would have to go in a command line argument I imagine)? If saving the file name as the variables is the best approach, I'm also ok with that.

@cleme
Copy link
Member

cleme commented Sep 7, 2023

I would say a simple search, because with long taxa names it is likely that users might make typos if the exact string has to be provided.

@kbpi314
Copy link
Collaborator Author

kbpi314 commented Sep 7, 2023

Makes sense - and with regards to providing data, should it be on a 'user must input the variables they want' or should the data for the points be provided for all the plots? My concern is that if there are 10k+ plots it might be time or space consuming to provide all those dataframes. Currently CUTIE has this parameter where you specify the upper bound of # of plots to produce, so we could provide data for only the plots that were graphed.

@cleme
Copy link
Member

cleme commented Sep 7, 2023

I guess it depends on the size of the dataset: with few samples, the "size" of an output file would not be that large so even generating >10K plots would not be an issue. My recommendation would be to go for the solution that you think is easier to code now, and if space/time becomes a problem then we modify the code to handle it.

@kbpi314
Copy link
Collaborator Author

kbpi314 commented Sep 7, 2023

Sounds good, I'll go with the output of a per-plot dataframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants