Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement data products being added from external objects such as papers #94

Open
soniamitchell opened this issue Jul 6, 2021 · 8 comments
Labels
enhancement New feature or request on hold

Comments

@soniamitchell
Copy link
Contributor

No description provided.

@soniamitchell soniamitchell added the enhancement New feature or request label Jul 7, 2021
@soniamitchell
Copy link
Contributor Author

Once adding papers to the registry with the CLI is implemented (and config.file format is agreed upon), we can implement this.

@richardreeve
Copy link
Member

richardreeve commented Aug 6, 2021

I have (what I consider to be!) very silly/exciting ideas with this one - can we come up with a way of recursing through papers and their references using the citation information (if that information is available programmatically) to put the whole dependency tree of a paper into the system (up to a certain depth, perhaps)? Can we also add in the supplementary materials and so on, so you actually have the whole paper and its extra bits referenced sensibly in one place? Would that be useful if we did?

@soniamitchell
Copy link
Contributor Author

Lol~ that actually sounds fun! Of course I need fair pull to implement paper imports first…

@richardreeve
Copy link
Member

Yes, I'm totally with you on the on hold designation of this one...

@richardreeve
Copy link
Member

As a thought experiment though, how would we turn a code run into something that would make sense for paper citations too while still doing its main job?

@soniamitchell
Copy link
Contributor Author

I find it much easier to work this kind of stuff out during implementation. That way I can see the context, use an example, and see any problems which might arise. But I’ll play along..

My first thought would be to register papers in the same way issues are added. That is, via a script with no additional fields in the config file. Of course that means the DP API would need to register them. Why are external objects registered in pull again? Was it because the DP API should be able to run offline? If so, there should be a problem.

I would then include an optional DOI argument in write_array().

I’d also add this to milestone 1, since adding data from papers seems pretty basic.

@richardreeve
Copy link
Member

richardreeve commented Aug 6, 2021

I see what you mean, and maybe we could do things that way (though there's definitely no time to add it to the 1.0 milestone!), but actually that's not quite what I was thinking of...

What I meant was that connecting papers and their citations involves no github repo, and potentially involves no config files or run scripts either, so if we want to use the code run registry table to make a paper an output and its references inputs, then we have to think about how that would work in the context of that table, or whether we would want to add a new table that specifically describes references rather than inputs.

The other use for this I've been thinking of for a while was being able to reference papers inside your code - so if you're implementing an algorithm or using a package in a file in your repo, you could cite it (using some clever syntax) at the point where you write that piece of code, and then (somehow, magically!) the pipeline will automatically pick up the dependencies and add the citation to the inputs / references list.

I appreciate I'm getting completely off topic here now, but I do wish it was easier to make sure that everything was correctly credited when I'm writing code... anyway, this isn't remotely high priority, I just thought it might be interesting to contemplate. You're probably right that it's easier to wait until we're actually trying to implement it.

@soniamitchell
Copy link
Contributor Author

Ahh.. I assumed you meant more generally.

I’m not sure how useful having references as inputs and papers as outputs would be? Also not sure about adding references whilst coding. You might need to take me through that, but I’d be hesitant to add too many pieces of functionality that aren’t common use cases.

Doing an analysis / making a cool visualisation from the references sounds fun though. I’d be interested in that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request on hold
Projects
None yet
Development

No branches or pull requests

2 participants