Implement data products being added from external objects such as papers #94

soniamitchell · 2021-07-06T16:28:28Z

No description provided.

soniamitchell · 2021-08-06T09:51:04Z

Once adding papers to the registry with the CLI is implemented (and config.file format is agreed upon), we can implement this.

richardreeve · 2021-08-06T14:40:07Z

I have (what I consider to be!) very silly/exciting ideas with this one - can we come up with a way of recursing through papers and their references using the citation information (if that information is available programmatically) to put the whole dependency tree of a paper into the system (up to a certain depth, perhaps)? Can we also add in the supplementary materials and so on, so you actually have the whole paper and its extra bits referenced sensibly in one place? Would that be useful if we did?

soniamitchell · 2021-08-06T14:51:26Z

Lol~ that actually sounds fun! Of course I need fair pull to implement paper imports first…

richardreeve · 2021-08-06T15:01:26Z

Yes, I'm totally with you on the on hold designation of this one...

richardreeve · 2021-08-06T15:03:13Z

As a thought experiment though, how would we turn a code run into something that would make sense for paper citations too while still doing its main job?

soniamitchell · 2021-08-06T16:15:29Z

I find it much easier to work this kind of stuff out during implementation. That way I can see the context, use an example, and see any problems which might arise. But I’ll play along..

My first thought would be to register papers in the same way issues are added. That is, via a script with no additional fields in the config file. Of course that means the DP API would need to register them. Why are external objects registered in pull again? Was it because the DP API should be able to run offline? If so, there should be a problem.

I would then include an optional DOI argument in write_array().

I’d also add this to milestone 1, since adding data from papers seems pretty basic.

richardreeve · 2021-08-06T17:14:50Z

I see what you mean, and maybe we could do things that way (though there's definitely no time to add it to the 1.0 milestone!), but actually that's not quite what I was thinking of...

What I meant was that connecting papers and their citations involves no github repo, and potentially involves no config files or run scripts either, so if we want to use the code run registry table to make a paper an output and its references inputs, then we have to think about how that would work in the context of that table, or whether we would want to add a new table that specifically describes references rather than inputs.

The other use for this I've been thinking of for a while was being able to reference papers inside your code - so if you're implementing an algorithm or using a package in a file in your repo, you could cite it (using some clever syntax) at the point where you write that piece of code, and then (somehow, magically!) the pipeline will automatically pick up the dependencies and add the citation to the inputs / references list.

I appreciate I'm getting completely off topic here now, but I do wish it was easier to make sure that everything was correctly credited when I'm writing code... anyway, this isn't remotely high priority, I just thought it might be interesting to contemplate. You're probably right that it's easier to wait until we're actually trying to implement it.

soniamitchell · 2021-08-06T19:34:27Z

Ahh.. I assumed you meant more generally.

I’m not sure how useful having references as inputs and papers as outputs would be? Also not sure about adding references whilst coding. You might need to take me through that, but I’d be hesitant to add too many pieces of functionality that aren’t common use cases.

Doing an analysis / making a cool visualisation from the references sounds fun though. I’d be interested in that.

soniamitchell added the enhancement New feature or request label Jul 7, 2021

soniamitchell added the on hold label Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement data products being added from external objects such as papers #94

Implement data products being added from external objects such as papers #94

soniamitchell commented Jul 6, 2021

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021 •

edited

Loading

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021

richardreeve commented Aug 6, 2021

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021 •

edited

Loading

soniamitchell commented Aug 6, 2021

Implement data products being added from external objects such as papers #94

Implement data products being added from external objects such as papers #94

Comments

soniamitchell commented Jul 6, 2021

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021 • edited Loading

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021

richardreeve commented Aug 6, 2021

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021 • edited Loading

soniamitchell commented Aug 6, 2021

richardreeve commented Aug 6, 2021 •

edited

Loading

richardreeve commented Aug 6, 2021 •

edited

Loading