Replies: 6 comments 3 replies
-
Howdy - I really like this refactor, I've found the current naming extremely confusing while trying to wrap my head around what everything represents. Some notes: ThemeI like the theme name of divisions DivisionFor the first feature, I think "division" makes me thing of a line or polygon, not really a Point. As something is being divided". But I get what you mean and don't hate calling it division AreaArea is good, I like this feature BoundaryFor boundary, I don't quite understand why its needed so some explanation would be good. Can this not also be captured by running a calculation over the Area feature? PropertiesSubTypeHow would very minute divisons like a political district be represented? I'm thinking like the district of a House Representative. Possibly a locality? PerspectivesCan you give some more examples of this one? It's saying India is a country, but don't we already know that from the subType? HierarchiesSame comment as perspective, can't this information nbe dervied with the parentDivisonId? Norms & DemographicsIs a nested dictionary the best way to represent this? I feel like it might make more sense to break into separate tables or objects as I could see these growing to extremely large. Will breaking these out in the future break backwards compatability? Another comment for all themes, what is the thinking with temporal dimensions? How will this refactor allow us to model changes in divisions over time? Thanks for the hard work on this, I'm excited to see it come to life! |
Beta Was this translation helpful? Give feedback.
-
Thanks for the thoughtful proposal and the 🎩 tip to WOF :) Some questions and comments...
|
Beta Was this translation helpful? Give feedback.
-
Stopping by to say a huge thanks to @nvkelso and @Spothedog1 for the feedback. There are some very good pointed questions and good points in this. Hopefully I'll be able to put together a more complete/substantive response in the near future. |
Beta Was this translation helpful? Give feedback.
-
Hi @Spotthedog1! We've had a chance to digest some of your feedback and I wanted to share some thoughts. Before sharing them, I want to note that our first version of the divisions schema was (quietly) released on 2024-03-12 as part of schema v0.9.0. Please feel free to check it out here. Much is the same as describe above, but some has also evolved.
The main use case for the To some limited degree, boundaries could be generated by cutting up the polygon of the area feature. However, there are places where this isn't really possible. Boundaries carry dispute information and, since not all parts of a boundary are necessarily disputed, the
The divisions schema isn't intended for arbitrary spatial divisions of a country, so the most accurate answer is that we don't ever intend to As a point of interest, a similar case people like to raise is postal delivery areas. We have made an explicit decision within Overture that these don't belong in the the divisions schema and would be included in the upcoming addresses schema, as they are very postal address
The idea is that India's from India's perspective looks different than India from Pakistan's perspectie due to various border disputes between the countries. In a certain sense there are several Indias, not just one. The Based on some of @nvkelso's feedback below, I think we need to tweak the
Yes and no. Let's start with the yes part. If there's only one hierarchy then yes, you can technically generate this information by walking up the Moving to the no part, a division can technically participate in multiple hierarchies. We again made a calculated choice for simplicity to make Multiple hierarchies can occur both with and without political perspectives. An example without a political perspectives angle is that the NYC borough of The Bronx technically belongs to Bronx County, a county subordinate to the state, and to New York City. It has both parents at the same time. An example with a political perspective angle is the city of Taipei or a neighborhood therein. From the Taiwanese perspective, the ancestor chain for these entities runs up to the Republic of China. From the "One China" perspective, it runs up to the PRC. The
Separate tables or objects aren't on the agenda because we have made the decision that every Overture object is a feature and must be respresentible as GeoJSON. In the version of the divisions schema we released in schema v0.9.0, linked above, we eliminated the We retained Since all properties must be attached to features, the only alternative to nesting is pulling everything from the nested object up to the top level. This mutes the natural explanatory power of grouping and nesting, tends to cause humans to get lost, and in the SQL context results in
We don't currently anticipate adding temporal dimensions to the schema. However, we do believe that our novel data release structure will help with temporal analysis, and that in the future as the schema stabilizes (soon!) and we begin to publish release-over-release GERS ID change information, temporal analysis will get even easier. The nice thing about our release structure is that every release is a full point-in-time snapshot in analysis-friendly cloud-native Parquet. Since all the data are available for all time, and the releases are very easy to compare to each other (e.g. by We could potentially add a bit more temporally useful release metadata by "wrapping" the releases in a technology like Iceberg. I'm curious to hear thoughts there. |
Beta Was this translation helpful? Give feedback.
-
@nvkelso this was a key question:
Current approachIt seems at first that this can be achieved in the current schema by creating a dedicated perspective for country Y and expressly including in that perspective only those countries it does recognize and not those that it doesn't recognize. Let's say for the sake of argument that the country of Bar is not recognized by the country of Foo, but Foo recognizes every other country. Basically there would need to be a division for Bar that contains a perspective entry for every country level division except Foo. id: 456
type: Feature
geometry: ...
properties:
theme: divisions
type: division
subtype: country
names:
primary: Bar
country: BR # Country = Bar (not Brazil, this example is fictitious!)
perspectives: # Basically every country except Foo is in this list 👇
- type: country
holder: BZ # Country = Baz (not Belize, this example is fictitious!)
- type: country
holder: QX # Country = Qux
- ... The biggest drawback of this approach is you need to duplicate the country Bar, because you would need one instance of Bar in the default perspective (no So you end up with something like:
Suggested alternative approachesA PR to schema was raised last week to try to tackle this issue: #162. Everyone is welcome to join the PR discussion! |
Beta Was this translation helpful? Give feedback.
-
FYI: BigQuery does not support arrays of arrays so the new This is related to point 4. above:
If hierarchy were objects instead of arrays, then a Obviously, leaking BigQuery concerns into the schema design is not desirable. But perhaps this is something worth keeping in mind if this limitation exists in other similar engines making if harder for (some) users to use directly. For reference for other BigQuery users, when creating an external tables, you need to set |
Beta Was this translation helpful? Give feedback.
-
In the upcoming months, Overture is marching toward a
v1.0.0
schema release which, once released, we hope will be forward-compatible with new features and free from backwardly-incompatible schema changes...As part of our effort to get to
v1.0.0
, we are looking at refactoring our current "admins" theme to achieve a number of design goals, some smaller, and some larger. The following are the main goals of the refactor:Request for Comments
We would love to get your feedback and field your questions on this proposed refactor!
Goals of Refactor
v0.7.0
isgeopolDisplay=hidden
.localityType
field is both confusing from a naming perspective and also confused as to what it is trying to represent—a local perspective, or a canonical value.Proposed Feature Types
The proposed refactored theme would be named "divisions" (alternatives considered:
boundaries
,administration
,administrative
). It would initially consist of three primary feature types:division
is a Point feature representing the approximate position of some kind of recognized official or non-official organization of people: country, province, city, neighborhood, etc., as seen from a given political perspective.area
is a Polygon or MultiPolygon feature capturing the shape of the land area, or the land area + territorial sea belonging to adivision
feature, as seen from thatdivision
's political perspective.boundary
is a LineString feature capturing a shared border line between twodivision
features, where the geometry is either wholly maritime or wholly non-maritime.The properties initially available on these three feature types are shown in the matrix table below. The property is described in short form, and a longer description is given below. Please be advised that this proposal leans on, and in some places shamelessly copies, prior art and ideas from the Who's on First ("WOF") schema. In particular, the
subType
property borrows heavily from WOF ideas about placetypes, and thehierarchies
property borrows from WOF the idea of replicating multiple hierarchies on each feature.division
area
boundary
subType
localType
subType
is called locally within the division.names
perspectives
hierarchies
parentDivisionId
country
subDivision
norms
norms
object with information about locals rules and customs within the division.demographics
demograhics
object with information about the division's demographics, e.g. population.flags
divisionIds
Detail of Selected Properties
subType
The
subType
property of a division or area contains a normalized string value from the enumeration whose members are listed in the table below. These enumeration members are taken from a simplified version of the full Who's on First placetypes hierarchy. These are described in the table below.The allowed containment of the
subType
fields is shown in the diagram below, where an arrow from a subordinate sub-type to a superior sub-type indicates that the subordinate sub-type may be contained by the superior one. We may be missing some arrows.perspectives
The optional
perspectives
property of a division feature is a non-empty array of perspective objects documenting the political perspective(s) from which the division is viewed. A perspective is a simple object with two mandatory fields, type and value. The below example documents India's political perspective:The type field contains a string value from an enumeration. Currently the only enumeration member is
"country"
. The value field contains a string value; if type is"country"
then value must contain an ISO 3166-1 alpha-2 country code string.If the
perspectives
property is omitted, it means the division is seen from a "default" political perspective. Otherwise, it is a non-empty array of perspective objects indicating the non-defaultperspectives
the division corresponds to. The term "default" does not imply any value-judgments, but is typically the most commonly-accessed perspective.hierarchies
Every division feature has a
hierarchies
property containing a non-empty array of hierarchies.hierarchies[0]
will be the default hierarchy and the order of the others will be "arbitrary", in that we don't specifically say what it will be, other than that we will strive to keep the order consistent.By containing a materialized view of hierarchies in this way, this structure enables an individual division to be understood in its full context, and enables the full ancestor tree structure (looking upward) to be queried against without needing to construct an obscure/complex SQL statement (like
WITH RECURSIVE
in Trino/Amazon Athena). This structure is inspired bywof:hierarchies
from Who's on First.parentDivisionId
For divisions that are countries,
parentDivisionId
is omitted. Otherwise, it contains the ID of the division feature that constitutes the division's immediate parent in the administrative hierarchy; or, if the division has multiple parents according to different hierarchies, then it is the default parent.The simplifying decision to have
parentDivisionId
be a scalar property instead of an array and thus reflect at most one parent, was made to make the most common use cases simple and easy, while the more complex use cases remain possible viahierarchies
.country
For divisions that are have their own dedicated ISO 3166-1 alpha-2 country code, the
country
property contains that code. Not all such features will havesubType=country
, e.g. Puerto Rico.For divisions that do not have their own dedicated ISO 3166-1 alpha-2 country code but where an ancestor feature does, this property duplicates the country code of the ancestor feature. If there are multiple ancestors due to multiple political perspectives, then this property contains the country code of the ancestor seen from the default political perspective.
A scalar property rather than an array is used for the same reason as for
parentDivisionId
: to make the most common cases easy, knowing that complex cases are possible using alternative means.subDivision
For divisions that have their own dedicated ISO 3166-2 principal subdivision code, the
subDivision
property contains that code.For divisions that do not have their own dedicated principal subdivision code, but where an ancestor feature does, this property duplicates the principal sub-division code of the ancestor feature. If there are multiple ancestors due to multiple different political perspectives, then this property contains the principal subdivision code of the ancestor seen from the default political perspective.
A scalar property rather than an array is used for the same reason as for
parentDivisionId
: to make the most common cases easy, knowing that complex cases are possible using alternative means.norms
The optional norms property of a division feature contains a norms object which specifies a small set of local rules and customs that are important to mapping use cases.
demographics
The optional
demographics
property of a division contains a demographics object which specifies a small set of demographics sub-properties that are important to mapping use cases.flags
The
flags
property is an array property containing a set of flags that may apply to the feature. The following flag options are available:area
feature type:isMaritime
flag, if present, indicates that the feature includes not only the division's land area but also its territorial sea. If absent, the feature contains only the land area.boundary
feature type:isMaritime
flag, if present, indicates that the boundary geometry lies wholly in a water body.isDisputed
flag, if present, indicates that location of the current boundary is not controversial, but that there is a dispute about where it should be.isAmbiguous
flag, if present, indicates that there is a dispute about where the boundary should be and that its current location is also controversial.divisionIds
The
divisionIds
property of aboundary
line feature is an array of length exactly two, where the first element is the ID of the division to the boundary's left, and the second element is the ID of the division to the boundary's right, as seen from a person oriented in the same direction as the boundary's line string geometry.Beta Was this translation helpful? Give feedback.
All reactions