-
Notifications
You must be signed in to change notification settings - Fork 505
Feature Request: Generate a Croissant metadata file (or any export format) before a dataset is published #11305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think #4132, "Allow to export metadata of unpublished datasets", is related, although some of the use cases might be different. #4372 is about how the metadata exports always contain metadata of the latest published version, even when a user is looking at an older version. I think it's a little less related but might be helpful to be aware of. |
I would love to see support for generating citations and export files for all versions of a dataset. Ideally, these citations and exports should include version information. This enhancement would also help efforts like version DOIs #4499 |
For now I'm looking into exporting just drafts rather than all versions. Here's some work in progress: |
Drafts are exported on-the-fly rather than being cached.
A draft PR for now: |
Drafts are exported on-the-fly rather than being cached.
Drafts are exported on-the-fly rather than being cached.
This PR is ready for review:
@mrisdal you said "or API". 😄 Heads up that at least as of this writing, the new "export drafts" functionality is API-only. (See the PR for why adding it to the UI is a bit complicated.
@jggautier yes, highly related. I just left a comment linking to the new PR.
@jggautier right, I didn't touch the UI at all so I didn't try to address this. Maybe we can work on this with the new UI. This issue:
@johannes-darms I don't know about citations (@qqmyers did some recent work in #11163) but for exports, my new PR does not allow export for all versions of a datasets. Only latest published (as always) and draft (new!) are supported. For other versions, you're welcome to open an issue. |
Overview of the Feature Request
Allow a depositor to download (via UI or API) the Croissant metadata file (or any metadata format) before the dataset is published. This is particularly important for datasets shared via Preview URL.
What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
API User, Depositor, Guest (someone accessing a Dataset via Preview URL)
What inspired the request?
I'm a resource co-chair for NeurIPS Datasets & Benchmarks track in 2025 and we are evaluating recommending Dataverse as a recommended repository to authors as part of a new requirement that authors generate and make accessible Croissant metadata representations of their datasets in order to automate that submissions are valid and to streamline the review process.
We expect that many authors of NeurIPS D&B track papers would choose to deposit their data via Harvard Dataverse because it offers a Preview URL feature, but it's a major limitation that a Croissant file is not generated. Authors will still need to manually generate them.
Additionally, Kaggle is another repository that offers both Preview URLs and is adding the ability to download a Croissant file for such datasets.
Even if the feature isn't added for this year's CFPs, it will likely be useful for next year.
What existing behavior do you want changed?
Allow download of Croissant metadata and data files for un-published Harvard Dataverse datasets.
Any brand new behavior do you want to add to Dataverse?
NA
Any open or closed issues related to this feature request?
After speaking with the team, I don't believe there are any.
Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?
No.
The text was updated successfully, but these errors were encountered: