Update before_dataset_saved
to return data so mutations can be applied
#4450
Labels
Issue: Feature Request
New feature or improvement to existing feature
Description
I have a massive dataset which is too big to work on quickly if we use the full table. I wanted to write a hook that would kick in based on run environment, intercept the data before it was saved and add a
limit(n)
to operation before it was saved by the catalog. To my surprise it turned out that this wasn't possible.Another possible use case would be something like PII stripping before save.
I solved this problem with a custom dataset but it feels clunky.
Context
The hook implementation (like all hooks actually)
return None
hookimpl
Possible Implementation
The runner tasky.py implements this hook and could be tweaked based on whether anything is returned.
The text was updated successfully, but these errors were encountered: