You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Something about the load process led to the acs2017_1yr.census_table_metadata being off in a few cases.
Specifically, the value for denominator_column_id was incorrectly set. This column is used to indicate when it's sensible to represent values from the table as percentages of a whole -- its value should be null when values in a table represent medians, aggregates, means, or other values which don't work that way.
Its value should also be null for tables with only one column, such as B01003: Total Population among others.
I fixed this directly in the PostgreSQL database for the running system, but we should review, possibly implement some of the sanity checks below, and possibly create a new PSQL dump (lower priority)
Specifically, separate from the aggregate/mean/etc case, this query should, logically, always return zero rows:
SELECT tmd.*FROMacs2017_1yr.census_table_metadata tmd,
(SELECT table_id, count(*)
FROMacs2017_1yr.census_column_metadataGROUP BY table_id
HAVINGcount(*) =1) as cmd
WHEREtmd.table_id=cmd.table_idANDtmd.denominator_column_idIS NOT NULL;
This command will ensure that the above command returns zero rows. It could probably be included as part of the data process after census_table_metadata is populated:
UPDATEacs2017_1yr.census_table_metadata tmd
SET denominator_column_id =nullFROM (SELECT table_id, count(*)
FROMacs2017_1yr.census_column_metadataGROUP BY table_id
HAVINGcount(*) =1) as cmd
WHEREtmd.table_id=cmd.table_idAND denominator_column_id IS NOT NULL;
This command will show which tables don't have a denominator column, but have more than one row, and don't match the most obvious table naming conventions which are a sign that the data shouldn't be "ratio'd". It is likely to return a few rows, and for now all I can think is that a person can scan them and think about whether there's any likely issue...
SELECT tmd.*FROMacs2017_1yr.census_table_metadata tmd,
(SELECT table_id, count(*)
FROMacs2017_1yr.census_column_metadataGROUP BY table_id
HAVINGcount(*) >1) as cmd
WHEREtmd.table_id=cmd.table_idANDtmd.denominator_column_id IS NULLANDLOWER(table_title) NOT LIKE'%aggregate%'ANDLOWER(table_title) NOT LIKE'%median%'ANDLOWER(table_title) NOT LIKE'%mean%';
The text was updated successfully, but these errors were encountered:
Something about the load process led to the
acs2017_1yr.census_table_metadata
being off in a few cases.Specifically, the value for
denominator_column_id
was incorrectly set. This column is used to indicate when it's sensible to represent values from the table as percentages of a whole -- its value should be null when values in a table represent medians, aggregates, means, or other values which don't work that way.Its value should also be null for tables with only one column, such as
B01003: Total Population
among others.I fixed this directly in the PostgreSQL database for the running system, but we should review, possibly implement some of the sanity checks below, and possibly create a new PSQL dump (lower priority)
Specifically, separate from the aggregate/mean/etc case, this query should, logically, always return zero rows:
This command will ensure that the above command returns zero rows. It could probably be included as part of the data process after
census_table_metadata
is populated:This command will show which tables don't have a denominator column, but have more than one row, and don't match the most obvious table naming conventions which are a sign that the data shouldn't be "ratio'd". It is likely to return a few rows, and for now all I can think is that a person can scan them and think about whether there's any likely issue...
The text was updated successfully, but these errors were encountered: