Skip to content

Package : Utils : Reference

Nuwan Waidyanatha edited this page Nov 23, 2023 · 1 revision

Introduction

The reference package is designed to handle domain specific system-wide static data; essentially taxonomical (or categorical) lookup data. It is designed to maintain reference data in any form such as in an RDBMS, NoSQL, or File forms.

Note - the current implementation of the reference data is maintained in the form of a util_refer table in an RDBMS.

Class specifications

  • export/import reference data from and into structured or semi-structure data files (e.g. JSON, CSV)
  • Read/write reference data to be used in the domain specific functionality (e.g. lookup and assign categorical data)

Properties

data

Set and Get reference data as a apache spark dataframe. A typical dataframe holds the following data elements:

    ref_pk   # system generated integer upon insert
    realm    # schema table entity the realm is associated with
    category # a category within the table the lookup is for
    code     # alternate code to the reference value
    value    # the used reference value for the specific category
    description # description about the reference value
    source_uuid # uuid or pk of the record from the data source
    data_source # storage data was taken e.g. S3 folder
    data_owner  # reference to the origin of the data

Realm list

  • The realm_list comprises all the realm values in the util_refer table
  • When getting the realm_list, if it is empty, the property getter will retrieve all values

Realm

  • The realm is domain specific.
  • For example, a entity may several categorical values defining the category, type, or other taxonomical static values.
  • A realm value must be a value of the realm_list

Category list

  • The property provides a realm specific filtered list of values
  • The setter will add a value to the realm list, if it is unique

Category

  • Each realm can have multiple categories
  • The property getter will validate the category against the category_list
  • To support filtering the realm_list a category value can be set

Functions

Get reference

A class function for retrieving a realm and category specific reference data from storage.

get_reference(
    realm: str = None,
    category:str=None,
    **kwargs
)
  • returns a spark dataframe (self._data is set with the newly filtered dataset)
  • retrieve all the reference data and then filter by the realm and category, if specified

Import into DB