Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming dataset construction or appending to an existing dataset #5311

Open
quanvuong opened this issue Mar 10, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@quanvuong
Copy link

Is your feature request related to a problem? Please describe.
I am always frustrated when I need to re-run tfds build when new data samples becomes available. Doing so is time-consuming.

Describe the solution you'd like
The ability to append data to an existing tfds dataset

Describe alternatives you've considered
I am not sure if there are any alternatives

Additional context
Add any other context or screenshots about the feature request here.

@quanvuong quanvuong added the enhancement New feature or request label Mar 10, 2024
@tomvdw
Copy link
Collaborator

tomvdw commented Mar 13, 2024

Do I understand correctly that you have a non-static data source from which you create a TFDS dataset? The data source regularly has new data appended to it. When new data is appended, you'd like TFDS to generate examples for those and append them to the TFDS dataset?

If so, is there a constant stream of new source data or is there new data on a regular basis, e.g., daily?

One consequence of this is that reading the same TFDS dataset on different days means that you'll read different data, i.e., model training would not be reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants