You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a use case where we have a master dataset containing all columns and rows. This data is then used for bespoke downstream analyses, often using subsets of this data. I would like to use the same validation plan and data dictionary on these smaller datasets.
The functionality of pointblank already goes some way in allowing this re-use, through functions such as has_columns(), set_tbl(), and yaml_write(), however I don't think the package has yet fully embraced this approach.
I have made a couple of suggestions below which would make this use case fully possible, it just depends on whether it fits with your vision:
If set_tbl() allows you to change the target dataframe, then is there any need to define the tbl when creating the agent/informant? It would allow the full separation between "recipe" and tbl. As far as I can see interrogate()/incorporate() already provides informative errors when the tbl defined isn't valid.
When writing an agent to YAML (pre-interrogation), the package does a great job in preserving all of the logic in preconditions and active, such that it can be re-used downstream. However, this isn't the case for informants where info_columns() and info_snippet() don't seem to be recorded.
Recording the "recipes" for these objects to YAML is a really useful thing, and would negate the need to create a separate package containing functions to regenerate the agents/informants.
The text was updated successfully, but these errors were encountered:
jl5000
changed the title
Fully embracing agent/informant "recipes" and target tbls
Fully embracing separation of agent/informant "recipes" and target tbls
Feb 19, 2025
I have a use case where we have a master dataset containing all columns and rows. This data is then used for bespoke downstream analyses, often using subsets of this data. I would like to use the same validation plan and data dictionary on these smaller datasets.
The functionality of
pointblank
already goes some way in allowing this re-use, through functions such ashas_columns()
,set_tbl()
, andyaml_write()
, however I don't think the package has yet fully embraced this approach.I have made a couple of suggestions below which would make this use case fully possible, it just depends on whether it fits with your vision:
If
set_tbl()
allows you to change the target dataframe, then is there any need to define the tbl when creating the agent/informant? It would allow the full separation between "recipe" and tbl. As far as I can seeinterrogate()
/incorporate()
already provides informative errors when the tbl defined isn't valid.When writing an agent to YAML (pre-interrogation), the package does a great job in preserving all of the logic in
preconditions
andactive
, such that it can be re-used downstream. However, this isn't the case for informants whereinfo_columns()
andinfo_snippet()
don't seem to be recorded.Recording the "recipes" for these objects to YAML is a really useful thing, and would negate the need to create a separate package containing functions to regenerate the agents/informants.
The text was updated successfully, but these errors were encountered: