-
Notifications
You must be signed in to change notification settings - Fork 585
Provide guidance for how agencies should designate Geospatial data #303
Comments
Since the use of the |
That's a reasonable suggestion, but the specification (and ISO) allows "(4) a geographic feature from the GeoNames database." Would the GeoPlatform handle that gracefully? The EPA's metadata catalog requires bounding boxes on all records, including non-geo ones, which default to the US extent. What about adopting a standard keyword to denote geo records? |
most geospatial datasets will use a bounding box instead of a name to indicate the (spatial) extent of the dataset/service. since Data.gov harvests the vast majority of its content from existing agency catalogs that already have FGDC/ISO metadata, it would be preferred if the Data.gov catalog uses the existing elements in metadata to conclude if the data is to be considered spatial or not. |
I agree with Marten that looking for a bounding box makes the most sense, but there is a problem when systems default to some regional (US extent) or global (-90,-180, 90, 180) bounding box when its not specified. For the NGDS we try to get a keyword 'non-geographic' added to explicitly indicate resources that do not have a geospatial footprint. Unless there is some explicit indication, telling resources that have a meaningful extent from resources that really aren't associated with a specific location is a very difficult problem. |
It seems like there may not be a clear way of tying a metadata field from Project Open Data to whether FGDC wants to consider a dataset 'geospatial'. The status quo seems to be working, if imperfect, so there may not be a consensus to change this now. |
It seems like there is a potential difference between two interpretations of "geospatial" in this conversation: (1) dataset contains attributes which describe location, (2) dataset is suitable to be included on geospatial.gov. It seems to me that (2) is a subset of (1). It would be confusing to try to handle both scenarios with one field, especially since providers may be unfamiliar with what is included on (2). Ideally, our navigation/aggregator/inventory software could look at the actual data and make a "guess" about its geospatialness. |
I think that I may have been taking a solution and going in search of a problem. I'm inclined to close this issue for now. |
@gbinal @philipashlock |
Is there any documentation or guidance on what distinguishes (2) from (1)? |
I would not go and try build a reasoning engine that reads the data and then decides whether it's Geospatial or not. There are two things imho:
|
Ah, spatial footprint (associated with the set) vs. spatial data (associated with the rows), is a useful distinction. An edge case occurs to me: If a set consists of multiple files, each of which has it's own different footprint (not location), a consolidated file will then have a footprint associated with each row. Would you consider the original unconsolidated set (which probably has a footprint that consists of a union of footprints of it's contained sets) to just have a footprint, or spatial data as well? |
One can have great discussions on this.. But mostly it comes down to doing what's practical and what makes common sense. After all: What is the extent of a point data set? The points? A convex hull around the points or a bounding box? As long as our monitors and paper are rectangular people typically use a minimum bounding box. Each state in the us has their own county data sets (an arbitrary example), then there are national county data sets that combine all in one. If both types are considered spatial data, I would consider the collection of state county data sets also a spatial data set. |
Bounding boxes break down when you have a swath of data (not global coverage) that happens to cross one or both poles. |
unless you can specify a polar SRS for the bounding box coordinates. Breaks if the bounding boxes have to be WGS84 or webMercator. |
If we refer back to the original post, the context was what should be used to cause records to appear in geoplatform.gov's view of the data.gov listings. In that context, the extent or bounding box is a poor proxy, as many non-spatial datasets could be said to apply to a specific extent, but still lack (as webmaven put it) spatial data associated with the rows. What about leveraging the "format" field? MIME type is a pretty useless designation in the geospatial world, it communicates very little that will help get most data onto a map. If we were to overload this parameter with a set of valid values that do make sense in the geospatial world, it might help. A good start for the valid value list might be here: |
And now I see that's precisely what Marten advocated in this issue: |
This has come up some with the FGDC/geoplatform.gov folks, who are the principle audience of this issue - namely whether there was interest in having a means datasets to be indicated as candidates for their curation. It's still an ongoing topic, but it looks like what is actually preferred is simply to include a theme of |
Our current guidance for this with regard to data.gov is outlined in http://www.digitalgov.gov/resources/how-to-get-your-open-data-on-data-gov/#federal-geospatial-data Unfortunately this means that agencies need to then manage multiple versions of their data.json file. An alternative approach would be to have a flag in the data.json file that denotes the metadata is available from a preferred source (at least for data.gov's purposes). This might also include the solution for #308. One issue we've encountered is that for agencies which have dozens or hundreds of disparate geospatial harvest sources, the combined version of that as data.json becomes a rather large and unwieldy file. If we were to use the data.json metadata as the filter for avoiding alternate/preferred duplicate sources then we may also want to ensure that metadata is provided as something more like JSON Lines (though ideally still as valid JSON) so that it can be parsed more easily, especially as these JSON files get into the hundreds of megabytes. |
Since the 'geospatial' filter for data.gov pretty much drives what ends up on geoplatform.gov's data catalog, there's a need to enable agencies to trigger that.
This is a related but distinct issue from representing FGDC metadata in the schema and is more a question of simply how to trigger that flag.
The text was updated successfully, but these errors were encountered: