-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update API spec to show changes for economic static
type datasets
#491
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1ba371a
to
a8bd220
Compare
a8bd220
to
2a1fb87
Compare
static
type datasets
franmoore05
approved these changes
Feb 18, 2025
Update the API spec with correct auth methods as the spec is currently showing outdated auth information. Simplify the tags by merging the two different `private` tags into one. Added descriptions to the tags to make it clear what they are denoting. Also fix the general yaml formatting.
Added swagger spec linting using redocly. This has only been added for local use and not added to the CI job at this point as it requires node to be installed resulting in a combined go and node docker container being needed. This can be added to CI as a later change.
The observations endpoint is not part of the dataset API so should not be part of this API spec.
* Improve the dataset type enum by adding descriptions of the types and removing the unused types. Ensure all references use the common dataset type definition. * Improve the alert type enum to include the enum options and descriptions of the options * Standardise related content links with a common model and indicate the required title and href fields * Correct the QMI link to a new model to highlight that it differs from the related content links and only has the href field and that the href is required * Add examples, defaults and min/max ranges to numerous fields to improve expectations and clarity for end users * Standardise the pagination fields with a common model to ensure consistency * Remove the dataset level links.taxonomy link as it is currently always returning the incorrect information and should be removed * Improve the formatting of the deprecation notices in the descriptions of the deprecated fields at the dataset level * Ensure the GET endpoints at the dataset level have the correct required fields specified. * Remove unused uri field on the metadata endpoint * Add default host and scheme for the public API to the spec. * Remove unused links.access_rights from metadata endpoint.
Improve the descriptions by fixing typos, clarifying terminology and improving the clarity of language where possible for all public endpoints that will be used for static datasets. Add examples for all fields that will be publically used for static datasets to better illustrate to a user what is expected. Also add `format` attributes to date time string fields for clarity and to ensure an accurate example is rendered.
Add the new `distributions` array and `quality_designation` field to the versions model. These fields are being added to support the static datasets. `distributions` will replace the existing `downloads` object. The use of an array rather than a fixed object will provide the flexibility to add or remove supported formats over time. It also better aligns to the DCAT model. The `downloads` object will be deprecated, but should continue to be populated for CMD and Cantabular datasets in order to provide backwards compatibility until v2. `quality_designation` replaces the `national_statistic` boolean at the top level. This change is to allow the designation to be changed over time (e.g. an experimental dataset may recieve accreditation at a point in its version history, an accredited dataset could lose accredition over time, etc.). Backwards compatibility can be achieved by populating the dataset level `national_statistic` field with the latest version's designation value. If the `quality_designation` is accredited, then the `national_statistic` boolean should be `true`, else it should be `false`.
The `id` fields on the edition and version models is the internal DB ID and should not be exposed publicly.
Replace the currently unused `publisher` field (singular) with a `publishers` array to accommodate the ESS requirement of having multiple publishers. Also remove the unnecessary publisher `type` field.
Ensure the correct required properties are listed and set the correct min array lengths for the dataset and version models.
Update the API spec to merge the version model down into the edtions model such that the edition is simply the latest version of that edition. This partially addresses one of the main API user complaints whereby they are required to determine the latest version every time in order to get the distribution download URLs. As the edition model is so simple, this can be achieved while maintaining backwards compatability. Further work is needed to understand whether the user complaint can be fully addressed by returning the latest version of the latest edition at the dataset series level (i.e. `/v1/datasets/{id}`) to prevent the user from having to make that extra request when querying datasets that have multiple editions and especially those that frequently release new editions.
Revert the removal of `links.taxonomy` for now to prevent breaking changes, but add a deprecation notice to explain the issues with the field and that it will provide outdated and misleading information.
All GET responses must include a strong ETag header to enable proper cache functionality, enable implementation of `If-None-Match` functionality and enable `If-Match` for subsequent PUT, PATCH and DELETE requests. All GET responses must also include the appropriate `Cache-Control` header to ensure that authenticated requests are not cached, and that all public requests are cached appropriately.
2a1fb87
to
a50eccf
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Update the public endpoints in the API spec (aka swagger spec) to be inline with the current API behaviour and then layer the planned changes to the API to support the required model for the
static
datasets on top.Further work will be required to review the write endpoints to ensure they align where the models used differ.
Linting has been introduced for the API spec. This is only currently configured to be run locally and has not been added to the CI job yet.
Summary of proposed changes to the API
The following are the changes proposed to be made to the API to support the current work to add
static
datasets to the catalogue:themes
array which is a list of taxonomy topics for the dataset totopics
to be inline with common terminology across the API to be more intuitive for userstheme
field as it was never used [1] and could cause confusion withtopics
for users who are familiar with DCAT terminologyuri
field which was intended to point to the dataset landing page as it was never used [1]access_rights
link from the dataset links as this field has never been used and could be confused with the DCAT term.cantabular_table
,cantabular_blob
andnomis
types from the dataset type enum as they have never been [1] and will not be used.static
dataset type to the dataset type enumdistributions
list which will replace thedownloads
map to enable a wider range of file types to be published as well as allowing the additional metadata to be provided. This will better align with DCAT and provide the necessary flexibility for the various formats of static dataset published. Backwards compatibility can be provided for CMD and Cantabular datasets by continuing to populate thedownloads
map, while also populating the new distributions list.static
type datasets should not populate the deprecateddownloads
map.quality_designation
field at the Version level. This field replaces the currentnational_statistic
boolean at the dataset level. This change is to support the 3 current designations a dataset can have Official, Accredited Official and Official in Development as well as to represent change in designation over time (e.g. a developmental dataset may become accredited or an accredited dataset may lose its accreditation). Backwards compatibility can be achieved by setting thenational_statistic
boolean totrue
when updating thelatest_version
link if the latest version isaccredited-official
, otherwise setting the boolean tofalse
.id
fields from editions and versions as these are the UUID database IDs and should not be exposed publicly.publisher
field which has never been used [1] with a newpublishers
array field to support the requirement to be able to have multiple publishers for a dataset. This field will be defaulted to a single publisher of ONS if not specified.type
field from the publisher. As mentioned abovepublisher
has never been used [1] so this is non-breaking.last_updated
as a publicly exposed field on both the dataset and version levels to be transparent about when any changes have been made.latest_edition
link to the dataset links to aid users in navigating to the latest edition. This value can be set at the same time as the currentlatest_version
is being set.latest_version
link in the editions endpoint responses. This change is deemed acceptable as the use of this link will no longer be necessary as the response is the latest version. This partially addresses one of the main complaints from users of the API, that the deep nesting and need to traverse down to the deepest level to get the download links is frustrating.ETag
andCache-Control
headers to all GET responses. TheCache-Control
should prevent caching of all authenticated requests and set appropriate cache options for public requests.[1] "never used" means that the field has never been populated and therefore has never been publicly visible nor has it been utilised by any of our internal systems
Outstanding considerations
While these changes cover the key points, the following still need to be addressed:
Summary of changes to bring the spec up to standard
The following changes were made to the API spec to bring it up to standard and inline with the current API implementation and therefore do not result in any API changes needing to be implemented:
redocly
static
type datasets)format
attribute to alldate-time
propertiesHow to review
Ensure the API spec is valid, clear and aligned to the minimal metadata model.
Try running the new spec linting locally to ensure it works correctly and finds no issues.
Who can review
!me