NIST 1500 in RDF #8

cwulfman · 2022-02-02T13:06:00Z

cwulfman
Feb 2, 2022
Collaborator

Proposal: Write an OWL ontology for NIST 1500-100 and use an RDF triple store as the application back end

The Election Results Common Data Format is actually a graph ontology.

The NIST SP 1500 100 standard defines a common data format for pre-election setup information and post-election results reporting: the report calls this the Election Results Common Data Format (ERCDF). The publication uses a UML data model to define the data, and then provides XML and JSON schemas that have been generated automatically from the UML.

Crucially, the documentation declares the UML model to be the primary specification:

The UML model represents a format-independent description of the data required by the three use cases of the specification. Its primary benefit is that it unambiguously defines and describes the data elements and how they are related without requiring readers to know the technical details of any particular data format implementation, e.g., XML. By using a model-based approach, the resultant data format is more likely to be well-structured and more tolerant to modifications. The data format can be generated from the model using commercial tools, thus if changes need to be made to a format, the model can be changed, and the format can be re-generated.

As the document notes, the UML class model is a graph data structure, while the two supported implementation formats, XML and JSON, are tree structures.

Thus, while NIST supports two implementation formats, XML and JSON, the definition is found in the UML. Why does NIST support two implementation formats, then? The reasons are almost certainly pragmatic: implementers are very likely to use JSON, and, to a lesser extent today, XML, to use the standard. But there is no reason to be wedded to these two implementation formats, and, indeed, it is truer to the specification to model it as an ontology.

Indeed, it is better to think of XML and JSON as equivalent serialization formats.

I propose that we consider developing an ontology for NIST 1500-100 based on the UML expressions and using an RDF triple store as the back end data store.

Sample RDF

Before we consider defining the ontology, let's experiment with expressing some data as RDF triples. Here are some simple Geopolitical Units:

<GpUnit xsi:type="ReportingUnit" ObjectId="princeton_precinct_11">
    <Name>
        <Text Language="en">Precinct 11</Text>
    </Name>
    <Type>precinct</Type>
</GpUnit>
<GpUnit xsi:type="ReportingUnit" ObjectId="princeton_borough">
    <ComposingGpUnitIds>princeton_precinct_11</ComposingGpUnitIds>
    <Name>
        <Text Language="en">Princeton</Text>
    </Name>
    <Type>borough</Type>
</GpUnit>
<GpUnit xsi:type="ReportingUnit" ObjectId="mercer_county">
    <ComposingGpUnitIds>princeton_borough</ComposingGpUnitIds>
    <Name>
        <Text Language="en">Mercer County</Text>
    </Name>
    <Type>county</Type>
</GpUnit>

Here is the equivalent JSON:

"GpUnit": [
   {
     "-xsi:type": "ReportingUnit",
     "-ObjectId": "princeton_precinct_11",
     "Name": {
       "Text": {
         "-Language": "en",
         "#text": "Precinct 11"
       }
     },
     "Type": "precinct"
   },
   {
     "-xsi:type": "ReportingUnit",
     "-ObjectId": "princeton_borough",
     "ComposingGpUnitIds": "princeton_precinct_11",
     "Name": {
       "Text": {
         "-Language": "en",
         "#text": "Princeton"
       }
     },
     "Type": "borough"
   },
   {
     "-xsi:type": "ReportingUnit",
     "-ObjectId": "mercer_county",
     "ComposingGpUnitIds": "princeton_borough",
     "Name": {
       "Text": {
         "-Language": "en",
         "#text": "Mercer County"
       }
     },
     "Type": "county"
   }]

Here is the equivalent RDF:

princeton_precinct_11 a Precinct ;
                      rdf:label "Precinct 11"@en ;
                      partOf princeton_borough .

princeton_borough a Borough ;
                  rdf:label "Princeton"@en ;
                  partOf mercer_county .

mercer_county a County ;
              rdf:label "Mercer County"@en .

The RDF is without question more perspicuous than either the XML or the JSON representations, yet it conveys the same information. But because this is RDF, it can convey much more.

First, the <ComponentUnitIds> element is an awkward element. It is used to represent geo-political composition: a particular county contains particular municipalities; a municipality contains particular precincts; etc. the ID/IDREF feature of XML is used to link elements together.

Topomerology is a rich area of study, and there are numerous systems for describing the relationships among geopolitical units.

So we can replace the awkward <ComponentUnitIds> class with relational properties, using one of several already-established semantics. This is one of the primary features of the Semantic Web: by sharing ontologies, you compound the ways your data can be linked with other data. (This feature is, unfortunately, often abused, however: one must be careful to avoid simply adopting an ontology because it uses the same English words to describe things; those things and relations may mean something very different in the domain for which the ontology was developed.)

The other, much-touted feature of the Semantic Web is its composability. We might be able to say:

mercer_county owl:sameAs <https://www.wikidata.org/wiki/Q496886> .

(Or, if wikidata is not authoritative, some government-maintained authority file.)

By doing so, mercer~~county~~ "inherits" all the properties defined in Wikidata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIST 1500 in RDF #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

NIST 1500 in RDF #8

cwulfman Feb 2, 2022 Collaborator

Proposal: Write an OWL ontology for NIST 1500-100 and use an RDF triple store as the application back end

The Election Results Common Data Format is actually a graph ontology.

Sample RDF

Replies: 0 comments

cwulfman
Feb 2, 2022
Collaborator