-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement data products being added from external objects such as papers #94
Comments
Once adding papers to the registry with the CLI is implemented (and config.file format is agreed upon), we can implement this. |
I have (what I consider to be!) very silly/exciting ideas with this one - can we come up with a way of recursing through papers and their references using the citation information (if that information is available programmatically) to put the whole dependency tree of a paper into the system (up to a certain depth, perhaps)? Can we also add in the supplementary materials and so on, so you actually have the whole paper and its extra bits referenced sensibly in one place? Would that be useful if we did? |
Lol~ that actually sounds fun! Of course I need fair pull to implement paper imports first… |
Yes, I'm totally with you on the |
As a thought experiment though, how would we turn a |
I find it much easier to work this kind of stuff out during implementation. That way I can see the context, use an example, and see any problems which might arise. But I’ll play along.. My first thought would be to register papers in the same way issues are added. That is, via a script with no additional fields in the config file. Of course that means the DP API would need to register them. Why are external objects registered in pull again? Was it because the DP API should be able to run offline? If so, there should be a problem. I would then include an optional DOI argument in write_array(). I’d also add this to milestone 1, since adding data from papers seems pretty basic. |
I see what you mean, and maybe we could do things that way (though there's definitely no time to add it to the 1.0 milestone!), but actually that's not quite what I was thinking of... What I meant was that connecting papers and their citations involves no github repo, and potentially involves no config files or run scripts either, so if we want to use the code run registry table to make a paper an output and its references inputs, then we have to think about how that would work in the context of that table, or whether we would want to add a new table that specifically describes references rather than inputs. The other use for this I've been thinking of for a while was being able to reference papers inside your code - so if you're implementing an algorithm or using a package in a file in your repo, you could cite it (using some clever syntax) at the point where you write that piece of code, and then (somehow, magically!) the pipeline will automatically pick up the dependencies and add the citation to the inputs / references list. I appreciate I'm getting completely off topic here now, but I do wish it was easier to make sure that everything was correctly credited when I'm writing code... anyway, this isn't remotely high priority, I just thought it might be interesting to contemplate. You're probably right that it's easier to wait until we're actually trying to implement it. |
Ahh.. I assumed you meant more generally. I’m not sure how useful having references as inputs and papers as outputs would be? Also not sure about adding references whilst coding. You might need to take me through that, but I’d be hesitant to add too many pieces of functionality that aren’t common use cases. Doing an analysis / making a cool visualisation from the references sounds fun though. I’d be interested in that. |
No description provided.
The text was updated successfully, but these errors were encountered: