Skip to content

metadata export for drafts via API #11398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Apr 3, 2025

What this PR does / why we need it:

#11305 explains that there are use cases where it is desirable to export metadata from datasets while they are still in draft. This pull request delivers this functionality, available via API.

Which issue(s) this PR closes:

Special notes for your reviewer:

Drafts are exported on-the-fly rather than being cached.

The export API is a bit non standard in that it doesn't support our pattern of being able to pass in either the database id or the PID of the dataset. Only the PID is supported. I didn't try to address this. My changes are backward compatible.

From https://github.com/gdcc/dataverse-exporters I only made a pull request to update the Croissant exporter. Once we merge this PR perhaps we can create issues for the remaining exporters to update them as well. Also, in that repo I believe we need to add some more docs to explain that if you upgrade to Dataverse 6.7 you should update exporter whatever to version whatever so that drafts are supported. I gave a heads up about this in the release note snippet.

I had to edit src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/DataverseXoaiItemRepository.java. I'm not sure the best way to test it.

I took a quick look at making the "export drafts" functionality available via UI but there are a few challenges:

  • We're trying to touch JSF as little as possible with the React UI on the horizon.
  • JSF constructs a URL that works well for published datasets. No API token is included. For drafts, I didn't want to introduce a security risk by simply adding the API token to the URL. I looked briefly at the "session user" concept in the Data Access API but it seems non-trivial to support it.
  • There's a fair amount of logic in the dataset page for exporting: (!DatasetPage.dataset.deaccessioned or (DatasetPage.workingVersion.deaccessioned and DatasetPage.canUpdateDataset())) and !DatasetPage.anonymizedAccess. I didn't want to break anything. By the way, the file page logic is simple (FilePage.fileMetadata.datasetVersion.dataset.released) but perhaps it should match the dataset page? 🤷

Suggestions on how to test this:

  • Create a draft dataset.
  • Follow the updated API docs and download the draft
  • Test all builtin exporters but note that exporters that rely on ddi (ddi and html and) and schema.org (schema.org and croissant at least) had to be updated.
  • Export drafts from the Croissant exporter which you'll have to build yourself from handle drafts gdcc/exporter-croissant#14

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Preview docs at https://dataverse-guide--11398.org.readthedocs.build/en/11398/api/native-api.html#export-metadata-of-a-dataset-in-various-formats

@github-actions github-actions bot added Croissant Croissant and Kaggle related work FY25 Sprint 20 FY25 Sprint 20 (2025-03-26 - 2025-04-09) Size: 20 A percentage of a sprint. 14 hours. Type: Feature a feature request labels Apr 3, 2025
@pdurbin pdurbin moved this to In Progress 💻 in IQSS Dataverse Project Apr 3, 2025
@pdurbin pdurbin self-assigned this Apr 3, 2025
@coveralls
Copy link

coveralls commented Apr 3, 2025

Coverage Status

coverage: 22.997% (-0.009%) from 23.006%
when pulling 23c990d on 11305-export-drafts
into e3bc7cf on develop.

This comment has been minimized.

@pdurbin pdurbin force-pushed the 11305-export-drafts branch from c7a9458 to 75e6247 Compare April 8, 2025 19:34

This comment has been minimized.

Drafts are exported on-the-fly rather than being cached.
@pdurbin pdurbin force-pushed the 11305-export-drafts branch from 75e6247 to 8602eff Compare April 9, 2025 13:59

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@pdurbin pdurbin changed the title metadata export for drafts metadata export for drafts via API Apr 9, 2025
@pdurbin pdurbin marked this pull request as ready for review April 9, 2025 18:42
@pdurbin pdurbin moved this from In Progress 💻 to Ready for Review ⏩ in IQSS Dataverse Project Apr 9, 2025
@pdurbin pdurbin removed their assignment Apr 9, 2025

This comment has been minimized.

@pdurbin pdurbin removed the Size: 20 A percentage of a sprint. 14 hours. label Apr 9, 2025
@pdurbin pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Apr 9, 2025
Copy link

github-actions bot commented Apr 9, 2025

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11305-export-drafts
ghcr.io/gdcc/configbaker:11305-export-drafts

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@cmbz cmbz added the FY25 Sprint 21 FY25 Sprint 21 (2025-04-09 - 2025-04-23) label Apr 9, 2025
@pdurbin pdurbin added this to the 6.7 milestone Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Croissant Croissant and Kaggle related work FY25 Sprint 20 FY25 Sprint 20 (2025-03-26 - 2025-04-09) FY25 Sprint 21 FY25 Sprint 21 (2025-04-09 - 2025-04-23) Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request
Projects
Status: Ready for Review ⏩
Development

Successfully merging this pull request may close these issues.

Feature Request: Generate a Croissant metadata file (or any export format) before a dataset is published
4 participants