-
Notifications
You must be signed in to change notification settings - Fork 6
datalayer
A data handle hides some details from a model - inputs are inputs, regardless of their source (from scenario data or other model outputs); outer iterations (decision/convergence) are hidden; current/previous/base timesteps are exposed as convenience.
Input data and parameters both deal with the DataArray abstraction over multi-dimensional inputs.
State and interventions deal with the decisions available to decision/optimisation modules.
current_timestep: int
previous_timestep: int
base_timestep: int
timesteps: list[int]
get_data(input_name, timestep=None): DataArray
get_base_timestep_data(input_name): DataArray
get_previous_timestep_data(input_name): DataArray
get_parameters(): dict[str, DataArray]
get_parameter(parameter_name): DataArray
get_state(): list[BuildInstruction]
get_current_interventions(): list[Intervention]
set_results(output_name, data)
A ResultsHandle
is available to decision/optimisation modules, with different access levels -
results from any model/output, from any timestep/decision iteration that's already run, but no
write access and no parameter/scenario data access.
get_results(model_name, output_name, timestep, decision_iteration): DataArray
A Store
handles data to configure, set up and execute model runs. Public methods are listed
further below. A Store
might be composed of a ConfigStore
and a DataStore
, which might
have different implementations to suit storing the various types of data.
Uses
- a modeller writes configuration data to the store, directly or through the app
- a model may reflect on configuration data to understand its inputs, outputs and parameters and their dimensions
-
smif
model runner reads config to set up and run aModelRun
Qualities
- config typically reflects smif object structures
- config objects often refer to others (as children or shared metadata)
Uses
- a modeller or data owner sets up input data, writing input data to the store
- a model reads input data, writes results data, and/or reads results data produced by other models
Qualities
- data typically has several dimensions, sometimes zero or one
- data can often be represented in 'tidy' columnar format
- metadata is sometimes shared between datasets
- there are often several variants of the same dataset (scenarios, parameterisations)
- data is sometimes sparse; sometimes it makes sense to use many default values and override targeted portions
There's some tension between normalising/denormalising metadata definitions:
- it's convenient and more immediately accessible to have self-describing data - e.g. storing datasets with geographical definitions always with geometries (in ShapeFiles, NetCDF, other as appropriate)
- where we have many variants of a given dataset, it seems wasteful of disk to duplicate the same definitions (e.g. geometries) with every variant
- when reconciling different model inputs and different data sources it's useful to have a clear way to point to shared definitions (or adjacent, differing definitions) of the metadata
Uses
- a modeller or data owner sets up input data with new dimensions (set of categories, spatial zones, timestep coverage)
- a modeller or data owner sets up new input data that should match some shared metadata
Qualities
- typically lists with short identifiers and potentially large descriptions
- spatial: vector geometries (with CRS), other attributes;
- temporal: interval definitions
- categorical: descriptions, short/long ids, cross-references
read_model_runs(): list[ModelRun]
read_model_run(model_run_name): ModelRun
write_model_run(model_run)
update_model_run(model_run_name, model_run)
delete_model_run(model_run_name)
read_sos_models(): list[SosModel]
read_sos_model(sos_model_name): SosModel
write_sos_model(sos_model)
update_sos_model(sos_model_name, sos_model)
delete_sos_model(sos_model_name)
read_sector_models(skip_coords=False): list[SectorModel]
read_sector_model(sector_model_name, skip_coords=False): SectorModel
write_sector_model(sector_model)
update_sector_model(sector_model_name, sector_model)
delete_sector_model(sector_model_name)
read_sector_model_parameter(sector_model_name, parameter_name): Spec
read_sector_model_parameter_default(sector_model_name, parameter_name): DataArray
write_sector_model_parameter_default(sector_model_name, parameter_name, data)
read_strategies(modelrun_name): list[Strategy]
write_strategies(modelrun_name, strategies)
read_interventions(sector_model_name): list[Intervention]
read_initial_conditions(sector_model_name): list[BuildInstruction]
read_all_initial_conditions(model_run_name): list[BuildInstruction]
read_state(modelrun_name, timestep, decision_iteration=None): list[Intervention]
write_state(state, modelrun_name, timestep, decision_iteration=None)
read_unit_definitions(): list[PintDefinitionString]
read_dimensions(): list[Coords]
read_dimension(dimension_name): Coords
write_dimension(dimension)
update_dimension(dimension_name, dimension)
delete_dimension(dimension_name)
read_coefficients(source_spec, destination_spec): numpy.ndarray
write_coefficients(source_spec, destination_spec, data)
read_scenarios(skip_coords=False): list[ScenarioModel]
read_scenario(scenario_name, skip_coords=False): ScenarioModel
write_scenario(scenario)
update_scenario(scenario_name, scenario)
delete_scenario(scenario_name)
read_scenario_variants(scenario_name): list[Variant]
read_scenario_variant(scenario_name, variant_name): Variant
write_scenario_variant(scenario_name, variant)
update_scenario_variant(scenario_name, variant_name, variant)
delete_scenario_variant(scenario_name, variant_name)
read_scenario_variant_data(scenario_name, variant_name, variable, timestep=None): DataArray
write_scenario_variant_data(scenario_name, variant_name, data_array, timestep=None)
read_narrative_variant_data(sos_model_name, narrative_name, variant_name,
write_narrative_variant_data(sos_model_name, narrative_name, variant_name,
read_results(modelrun_name, model_name, output_spec, timestep=None, decision_iteration=None): DataArray
write_results(data_array, modelrun_name, model_name, timestep=None, decision_iteration=None)
The database is one data store method/option which can be used by smif. The interface, part of the datalayer, provides an access point for the management of the stored data.
PostgreSQL relational database store for configuration data. Built by option of user locally.
The interface supports methods for the datalayer for the writing, reading and management of data already in the database.
- writing
- supportd the writing of new data - should be passed as a dictionary
- reading
- allows reading of data from the database - returned data passed as a dictionary
- updating
- allows the updating of data already in the database - pass a dictionary with only the data to be updated
- to add/discuss - second option to pass the full object definition including values which are to not change?
- allows the updating of data already in the database - pass a dictionary with only the data to be updated
- deleting
- delete existing data from the database
Labels:
- data-layer
- smif
Notes from github: https://github.com/nismod/smif/pull/276#pullrequestreview-182210297
There is some code in the data_handle.get_data() method (_resolve_dependency) which seems to do something similar. Is it worth removing that, and ensuring that users explicitly define which data source to use in base timesteps and future timesteps, or do we want it to be resolved automatically in the data_handle?
...
The code in data_handle decides which source to pull from - scenario or previous model result - depending on whether we're in the base year. I think this is okay - it avoids having two differently-named inputs, one of which is only provided with data in the base year, the other only in non-base years.
I thought it was more ambiguous how best to handle a request for data from a timestep previous to the base timestep - always base_timestep - 1? base_timestep - inferred_timestep_stride? just base_timestep?
Labels:
-
data-layer
-
smif
Add "initialize" method to Store to setup an empty store
-
A store should be responsible for setting up itself
-
a DataFileInterface should set up the folder structure it requires
-
a DatabaseInterface should set up the tables and connection it needs
Labels:
-
data-layer
-
smif
DataStore
should be responsible for knowing where its files are,ConfigStore
shouldn't care about data locations. -
should simplify
Store
implementation -
suggest compound key lookup e.g.
(a,b,c,d): file.ext
could be stored in YAML at root of data file folder
Labels:
- data-layer
- smif
Ensure validation takes place at the Store
layer
Labels:
- data-layer
- smif
Currently skipping TestWarmStart
in test_data_store_csv.py
as the implementation has
changed - warm start is implemented in Store
, DataStore
implementations should only be
concerned with reporting available_results
(line 329 onwards)