datalayer

Design of the data layer

`DataHandle` is provided to a `Model` at runtime

A data handle hides some details from a model - inputs are inputs, regardless of their source (from scenario data or other model outputs); outer iterations (decision/convergence) are hidden; current/previous/base timesteps are exposed as convenience.

Input data and parameters both deal with the DataArray abstraction over multi-dimensional inputs.

State and interventions deal with the decisions available to decision/optimisation modules.

current_timestep: int
previous_timestep: int
base_timestep: int
timesteps: list[int]

get_data(input_name, timestep=None): DataArray
get_base_timestep_data(input_name): DataArray
get_previous_timestep_data(input_name): DataArray

get_parameters(): dict[str, DataArray]
get_parameter(parameter_name):  DataArray

get_state(): list[BuildInstruction]
get_current_interventions(): list[Intervention]

set_results(output_name, data)

A ResultsHandle is available to decision/optimisation modules, with different access levels - results from any model/output, from any timestep/decision iteration that's already run, but no write access and no parameter/scenario data access.

get_results(model_name, output_name, timestep, decision_iteration): DataArray

A `Store` interface provided lower-level methods

A Store handles data to configure, set up and execute model runs. Public methods are listed further below. A Store might be composed of a ConfigStore and a DataStore, which might have different implementations to suit storing the various types of data.

Config

Uses

a modeller writes configuration data to the store, directly or through the app
a model may reflect on configuration data to understand its inputs, outputs and parameters and their dimensions
smif model runner reads config to set up and run a ModelRun

Qualities

config typically reflects smif object structures
config objects often refer to others (as children or shared metadata)

Data

Uses

a modeller or data owner sets up input data, writing input data to the store
a model reads input data, writes results data, and/or reads results data produced by other models

Qualities

data typically has several dimensions, sometimes zero or one
data can often be represented in 'tidy' columnar format
metadata is sometimes shared between datasets
there are often several variants of the same dataset (scenarios, parameterisations)
data is sometimes sparse; sometimes it makes sense to use many default values and override targeted portions

Metadata

There's some tension between normalising/denormalising metadata definitions:

it's convenient and more immediately accessible to have self-describing data - e.g. storing datasets with geographical definitions always with geometries (in ShapeFiles, NetCDF, other as appropriate)
where we have many variants of a given dataset, it seems wasteful of disk to duplicate the same definitions (e.g. geometries) with every variant
when reconciling different model inputs and different data sources it's useful to have a clear way to point to shared definitions (or adjacent, differing definitions) of the metadata

Uses

a modeller or data owner sets up input data with new dimensions (set of categories, spatial zones, timestep coverage)
a modeller or data owner sets up new input data that should match some shared metadata

Qualities

typically lists with short identifiers and potentially large descriptions
- spatial: vector geometries (with CRS), other attributes;
- temporal: interval definitions
- categorical: descriptions, short/long ids, cross-references

read_model_runs(): list[ModelRun]
read_model_run(model_run_name): ModelRun
write_model_run(model_run)
update_model_run(model_run_name, model_run)
delete_model_run(model_run_name)

read_sos_models(): list[SosModel]
read_sos_model(sos_model_name): SosModel
write_sos_model(sos_model)
update_sos_model(sos_model_name, sos_model)
delete_sos_model(sos_model_name)

read_sector_models(skip_coords=False): list[SectorModel]
read_sector_model(sector_model_name, skip_coords=False): SectorModel
write_sector_model(sector_model)
update_sector_model(sector_model_name, sector_model)
delete_sector_model(sector_model_name)

read_sector_model_parameter(sector_model_name, parameter_name): Spec
read_sector_model_parameter_default(sector_model_name, parameter_name): DataArray
write_sector_model_parameter_default(sector_model_name, parameter_name, data)

read_strategies(modelrun_name): list[Strategy]
write_strategies(modelrun_name, strategies)

read_interventions(sector_model_name): list[Intervention]

read_initial_conditions(sector_model_name): list[BuildInstruction]
read_all_initial_conditions(model_run_name): list[BuildInstruction]

read_state(modelrun_name, timestep, decision_iteration=None): list[Intervention]
write_state(state, modelrun_name, timestep, decision_iteration=None)

read_unit_definitions(): list[PintDefinitionString]

read_dimensions(): list[Coords]
read_dimension(dimension_name): Coords
write_dimension(dimension)
update_dimension(dimension_name, dimension)
delete_dimension(dimension_name)

read_coefficients(source_spec, destination_spec): numpy.ndarray
write_coefficients(source_spec, destination_spec, data)

read_scenarios(skip_coords=False): list[ScenarioModel]
read_scenario(scenario_name, skip_coords=False): ScenarioModel
write_scenario(scenario)
update_scenario(scenario_name, scenario)
delete_scenario(scenario_name)

read_scenario_variants(scenario_name): list[Variant]
read_scenario_variant(scenario_name, variant_name): Variant
write_scenario_variant(scenario_name, variant)
update_scenario_variant(scenario_name, variant_name, variant)
delete_scenario_variant(scenario_name, variant_name)

read_scenario_variant_data(scenario_name, variant_name, variable, timestep=None): DataArray
write_scenario_variant_data(scenario_name, variant_name, data_array, timestep=None)

read_narrative_variant_data(sos_model_name, narrative_name, variant_name,
write_narrative_variant_data(sos_model_name, narrative_name, variant_name,

read_results(modelrun_name, model_name, output_spec, timestep=None, decision_iteration=None): DataArray
write_results(data_array, modelrun_name, model_name, timestep=None, decision_iteration=None)

Database design and interface

The database is one data store method/option which can be used by smif. The interface, part of the datalayer, provides an access point for the management of the stored data.

Implimented store

PostgreSQL relational database store for configuration data. Built by option of user locally.

Methods

The interface supports methods for the datalayer for the writing, reading and management of data already in the database.

writing
- supportd the writing of new data - should be passed as a dictionary
reading
- allows reading of data from the database - returned data passed as a dictionary
updating
- allows the updating of data already in the database - pass a dictionary with only the data to be updated
  - to add/discuss - second option to pass the full object definition including values which are to not change?
deleting
- delete existing data from the database

Stories

Decide how to get initial values for between-timestep data

Labels:

data-layer
smif

Notes from github: https://github.com/nismod/smif/pull/276#pullrequestreview-182210297

There is some code in the data_handle.get_data() method (_resolve_dependency) which seems to do something similar. Is it worth removing that, and ensuring that users explicitly define which data source to use in base timesteps and future timesteps, or do we want it to be resolved automatically in the data_handle?

...

The code in data_handle decides which source to pull from - scenario or previous model result - depending on whether we're in the base year. I think this is okay - it avoids having two differently-named inputs, one of which is only provided with data in the base year, the other only in non-base years.

I thought it was more ambiguous how best to handle a request for data from a timestep previous to the base timestep - always base_timestep - 1? base_timestep - inferred_timestep_stride? just base_timestep?

Move setup functionality into Store implementations

Labels:

data-layer
smif

Add "initialize" method to Store to setup an empty store
A store should be responsible for setting up itself
a DataFileInterface should set up the folder structure it requires
a DatabaseInterface should set up the tables and connection it needs

Remove references to file locations from config store

Labels:

data-layer
smif

DataStore should be responsible for knowing where its files are, ConfigStore shouldn't care about data locations.
should simplify Store implementation
suggest compound key lookup e.g. (a,b,c,d): file.ext could be stored in YAML at root of data file folder

Store to return objects from config calls

Labels:

data-layer
smif

Ensure validation takes place at the Store layer

Test available_results implementations

Labels:

data-layer
smif

Currently skipping TestWarmStart in test_data_store_csv.py as the implementation has changed - warm start is implemented in Store, DataStore implementations should only be concerned with reporting available_results

(line 329 onwards)

ITRC Banner

Home

Actors

Code Style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datalayer

Design of the data layer

`DataHandle` is provided to a `Model` at runtime

A `Store` interface provided lower-level methods

Config

Data

Metadata

Database design and interface

Implimented store

Methods

Stories

Decide how to get initial values for between-timestep data

Move setup functionality into Store implementations

Remove references to file locations from config store

Store to return objects from config calls

Test available_results implementations

Clone this wiki locally

datalayer

Design of the data layer

DataHandle is provided to a Model at runtime

A Store interface provided lower-level methods

Config

Data

Metadata

Database design and interface

Implimented store

Methods

Stories

Decide how to get initial values for between-timestep data

Move setup functionality into Store implementations

Remove references to file locations from config store

Store to return objects from config calls

Test available_results implementations

Clone this wiki locally

`DataHandle` is provided to a `Model` at runtime

A `Store` interface provided lower-level methods