Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Further accrualPeriodicity values #292

Closed
gbinal opened this issue Mar 13, 2014 · 24 comments
Closed

Further accrualPeriodicity values #292

gbinal opened this issue Mar 13, 2014 · 24 comments

Comments

@gbinal
Copy link
Contributor

gbinal commented Mar 13, 2014

Moving from GSA/enterprise-data-inventory#92 - cc @dinali @regina-avila

We have a dataset that is updated on a decennial basis, it fails validation because that's not an option. Can you add it?

I question the use of the DCCDAccrualPeriodicity for the accepted values of this element as it was only a working draft and doesn't appear to have been revisited since it's introduction in 2004. In some instances the only value that could fit my data is "completely irregular" which sounds odd and can be inaccurate. I'm glad to see that populating that field is no longer required.

What other values besides decennial are needed by agencies?

@smrgeoinfo
Copy link
Contributor

wouldn't it make sense to just report the expected time interval between updates, with a default unit of days?
Annual --> 365
Bimonthly--> 60
Semiweekly -- 3.5
Daily --> 1
Biweekly -- 14 (did I mix up semi and bi?)
Semiannual --> 180
Biennial -- 730
Triennial -- 1100
Three times a week --> 2.3
Three times a month -->10
Continuously updated --> 1, 0.04, .0007, 0.00001 (day, hour, min, sec?)
Monthly --> 30
Quarterly --> 41
Semimonthly -- 60
Three times a year -->120
Weekly -->7
Completely irregular -- this might need some kind of special value (-1?), or just 'irregular'

@dsmorgan77
Copy link
Contributor

I've got one that's updated every five years.

@torrin47
Copy link
Contributor

@smrazgs I appreciate the desire to move away from textual interval descriptions, otherwise the list could grow exhaustively long (our folks are asking for quadrennial). Using days to specify the interval, however, might imply an inappropriate level of precision. The guidance for the "modified" field already addresses this:

If there is a need to reflect that the dataset is continually updated, ISO 8601 formatting can account for this by giving the duration. For instance, P1D for daily, P2W for every two weeks, and PT5M for every five minutes.
http://en.wikipedia.org/wiki/ISO_8601#Durations

Why not utilize this standard here as well? It's at least as concise and efficient, if not quite as readable as the text descriptions.

@smrgeoinfo
Copy link
Contributor

I like the idea of using the standard, but the duration part of 8601 seems to be less widely used (at least FWIW I was unaware of that part). It certainly provides a way to specify the necessary information.

@gbinal
Copy link
Contributor Author

gbinal commented Apr 14, 2014

A few thoughts:

  • We're currently adhering to a standing ("Must be a value from DCCDAccrualPeriodicity") and I definitely think it is important that we stick to one. But,
  • There are a handful of reasonable and discrete requests for alternatives beyond those listed in Dublin Corps.

Does anyone know of alternate standards for this field?

@berwin222
Copy link

At Department of Defense, we need Quadrennial added to the list of values.

@gbinal
Copy link
Contributor Author

gbinal commented May 5, 2014

Thanks, Bill. We'll hopefully wrap this issue up soon.

@waldoj, @JoshData, @dsmorgan77, @benbalter, @philipashlock, @mhogeweg -

Any thoughts on my question about alternate standards? If there's not another good answer, it would seem that possibly the move is to say that the field must be a value from DCCDAccrualPeriodicity or one of these other handful of options that we list.

@waldoj
Copy link

waldoj commented May 5, 2014

I do not know of alternate standards, and I've been looking for them, since I need them for another project. For my other project, I've tentatively come to the conclusion that it's necessary to go outside of the standard, basically by doing what you're describing here, allowing select additional values.

@mhogeweg
Copy link
Contributor

mhogeweg commented May 8, 2014

DCCDAccrualPeriodicity appears to be a proposed term that has been a working draft in the 'proposed' state for 10 years... the challenge with using a fixed domain for values that vary greatly.

ISO 8601 has a nice definition of time interval that is simple to parse, human readable, and allows all flexibility needed. You could limit variation by further specifying that time intervals be specified using the duration method: P3Y1M4DT1H59M26.5S.

@philipashlock philipashlock added this to the Next Version of Common Core Metadata Schema milestone May 8, 2014
@dsmorgan77
Copy link
Contributor

👍 to the idea of using the ISO 8601 time interval to represent this. Using the time interval offers the most flexibility and allows for any agency that needs a particular permutation to just go ahead and express it. DoD wants quadrennial? That's P4Y. I want every 10 minutes? That's P10M. Speaking of that last example - the ISO 8601 time interval also offers more precision than DCCDAccrualPeriodicity. In the current form, anything that's updated more frequently than daily is shown as "continuously updated."

@ajturner
Copy link

ajturner commented May 9, 2014

👍 ISO8601

1 similar comment
@smrgeoinfo
Copy link
Contributor

👍 ISO8601

@regongithub
Copy link

I vote for ISO8601

@gbinal
Copy link
Contributor Author

gbinal commented Jul 17, 2014

There seems to be a strong support for migrating to ISO 8601. Some ideas to facilitate this transition would be:

  • a table of common conversions for terms like hourly, daily, weekly.
  • guidance for nontechnical data managers.

@gbinal
Copy link
Contributor Author

gbinal commented Jul 17, 2014

Some concerns include:

  • the effort on data creators who aren't familiar with 8601.
  • ensuring a human-friendly experience for the end user.
  • migrating and updating data management systems.
  • the importance of versioning for the schema going forward.

@bsweezy
Copy link

bsweezy commented Jul 17, 2014

Does 8601 allow for a way to indicate that the interval is irregular? I do not see that it does. Do many records use that option or should?

@dinali
Copy link

dinali commented Jul 17, 2014

We have several datasets that are updated irregularly, so it would be helpful to find a standard that could supports this.

@philipashlock
Copy link
Contributor

I'm in favor of ISO8601, but one limitation worth noting:

As far as I know ISO 8601 doesn't allow you to specify intervals without end dates unless you specify a repeating interval. I've come across two attempts to address this. One is the Dublin Core Collection Description : Open Date Range Format (DCCD ODRF) which states, "This representation of an open date range is not compatible with the representation of a time-interval defined by ISO8601:2000." and the other is the LOC Extended Date/Time Format (EDTF). What you can do with 8601 is specify an ambiguous end date (eg 2005-05-30/2015 where you just know the end date is in 2015) or if it's a dataset representing data with repeating intervals you can specify that without an enddate, as noted on Project Open Data (http://project-open-data.github.io/schema/#temporal):

If there is a need to reflect that the dataset is continually updated, ISO 8601 formatting can account for this with repeating intervals. For instance, updated monthly starting in January 2010 and continuing through the present would be represented as: R/2010-01/P1M

@lilybradley
Copy link

Thanks for the clarification @philipashlock. I misunderstood ISO8601. I thought it was a unit of time format. But rather it is a date and time format, i.e., a specific date and time.

Another option would be to represent units/quantities of time. ISO 80000-3:2006, similar-ish to @smrazgs suggestion. From a practical standpoint, filling in the unit of time is easier without a specific begin/end date, particularly for older surveys, or data collection systems. ISO 80000-3:2006 would also allow for an easier find/replace conversion of the current metadata field.

@philipashlock philipashlock modified the milestone: Next Version of Common Core Metadata Schema (1.0 -> 1.1.) Jul 24, 2014
@gbinal
Copy link
Contributor Author

gbinal commented Aug 15, 2014

There's a been a good deal of digging into this further and here's an update:

There's not an answer as to what is required for this field. It would be allowable to specify ISO-8601 as the norm though it'd actually be important to note that the figure must be a frequency, in the form of a duration. It other words, there's some ISO-8601-structured records that would not be appropriate for this, for example, including a start date with the recurring time period. Instead, it should hold closely to the examples shown below.

Term ISO-8601
Annual R/P1Y
Bimonthly R/P2M or R/P0.5M
Semiweekly R/P3.5D
Daily R/P1D
Biweekly R/P2W or R/P0.5W
Semiannual R/P6M
Biennial R/P2Y
Triennial R/P3Y
Three times a week R/P0.33W
Three times a month R/P0.33M
Continuously updated R/PT1S
Monthly R/P1M
Quarterly R/P3M
Semimonthly R/P0.5M
Three times a year R/P4M
Weekly R/P1W

Notes:

This structure would allow for any other needs that are not addressed by the above (e.g. decennial = R/P10Y).

On the issue of converting 'completely irregular', it would seem to be allowable to specify that the only alternative to ISO-8601 for this field would be irregular.

Update: @philipashlock just pointed out that the table should be more along the lines of R/P1Y instead of P1Y. I'm updating the above to that effect.

gbinal added a commit that referenced this issue Sep 4, 2014
Seeking to address #292.  

Note that this is accompanied by an appendix that is [still under construction](https://github.com/project-open-data/project-open-data.github.io/blob/metadata-v-1.1/iso8601_guidance.md#accrualperiodicity).
@gbinal
Copy link
Contributor Author

gbinal commented Sep 4, 2014

This is addressed in this commit - 582fe16

rebeccawilliams pushed a commit that referenced this issue Oct 2, 2014
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?:

* Adds optional describedByType field at the dataset and distribution level (#291, #332)
* Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358)
* Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358)
* Changes publisher field to an object that allows multiple levels of organizations (#296)
* Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) 
* Changes format field to a human readable description and to exist only within distribution (#272, #293)
* Adds optional description field for use within distribution (#248)
* Adds optional title field for use within distribution (#248)
* Changes accrualPeriodicity field to use ISO 8601 date syntax (#292)
* Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217)
* Changes license field to be a URL (#196)
@bbrotsos
Copy link

bbrotsos commented Nov 1, 2014

Was it finalized that Project Open Data is extending ISO-8601 vocabulary to include "irregular"?

@gbinal
Copy link
Contributor Author

gbinal commented Nov 3, 2014

@bbrotsos - yes, that is what is currently on track. You can see the proposed language on the staging instance:

screen shot 2014-11-03 at 4 19 45 pm

@gbinal
Copy link
Contributor Author

gbinal commented Nov 7, 2014

Thank you for driving the conversation around this issue and helping to assemble the v1.1 metadata update.

There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. Project Open Data is a living project though. Please continue any conversations around how the schema can be improved with new issues and pull requests!

It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests