Skip to content

Conversation

@raphaelreinauer
Copy link
Collaborator

@raphaelreinauer raphaelreinauer commented Apr 14, 2024

This PR moves the dataset cloud functionality from Google Cloud Storage (GCS) to Zenodo. Zenodo provides a more accessible and open platform for hosting and sharing datasets. It is free of charge and not connected to a GCS account, which could be deactivated. This ensures longtime support for the dataset cloud.

When adding a new dataset, the following steps should be followed:

  1. Ensure you have access to Zenodo and obtain an access token.
  2. The DatasetUploader class is used to upload the dataset files to Zenodo. Provide the necessary metadata and file paths.
  3. After the upload is successful, a configuration file will be automatically created for the dataset.
  4. Commit the generated configuration file to the repository as part of the PR.

By committing the dataset configuration file to the repository, everyone can access and use the dataset, even without having a Zenodo access key. The configuration file contains the necessary information to download and retrieve the dataset from Zenodo.

This change simplifies adding new datasets and makes them more easily available to all users.

print("Using TPU!")
except ModuleNotFoundError:
print("No TPUs...")
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still WIP; I'll let you know once it's ready to review.

@raphaelreinauer raphaelreinauer changed the title WIP: Use zenodo instead of gcs for DatasetCloud WIP: Move dataset cloud from GCS to Zenodo Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants