Update API spec to show changes for economic `static` type datasets #491

janderson2 · 2025-02-11T20:05:30Z

What

Update the public endpoints in the API spec (aka swagger spec) to be inline with the current API behaviour and then layer the planned changes to the API to support the required model for the static datasets on top.

Further work will be required to review the write endpoints to ensure they align where the models used differ.

Linting has been introduced for the API spec. This is only currently configured to be run locally and has not been added to the CI job yet.

Summary of proposed changes to the API

The following are the changes proposed to be made to the API to support the current work to add static datasets to the catalogue:

Rename the newly added themes array which is a list of taxonomy topics for the dataset to topics to be inline with common terminology across the API to be more intuitive for users
Remove the unused theme field as it was never used [1] and could cause confusion with topics for users who are familiar with DCAT terminology
Remove the unused uri field which was intended to point to the dataset landing page as it was never used [1]
Remove the access_rights link from the dataset links as this field has never been used and could be confused with the DCAT term.
Remove cantabular_table, cantabular_blob and nomis types from the dataset type enum as they have never been [1] and will not be used.
Add the static dataset type to the dataset type enum
Add distributions list which will replace the downloads map to enable a wider range of file types to be published as well as allowing the additional metadata to be provided. This will better align with DCAT and provide the necessary flexibility for the various formats of static dataset published. Backwards compatibility can be provided for CMD and Cantabular datasets by continuing to populate the downloads map, while also populating the new distributions list. static type datasets should not populate the deprecated downloads map.
Add quality_designation field at the Version level. This field replaces the current national_statistic boolean at the dataset level. This change is to support the 3 current designations a dataset can have Official, Accredited Official and Official in Development as well as to represent change in designation over time (e.g. a developmental dataset may become accredited or an accredited dataset may lose its accreditation). Backwards compatibility can be achieved by setting the national_statistic boolean to true when updating the latest_version link if the latest version is accredited-official, otherwise setting the boolean to false.
Remove the internal id fields from editions and versions as these are the UUID database IDs and should not be exposed publicly.
Replace the singular publisher field which has never been used [1] with a new publishers array field to support the requirement to be able to have multiple publishers for a dataset. This field will be defaulted to a single publisher of ONS if not specified.
Remove the unnecessary type field from the publisher. As mentioned above publisher has never been used [1] so this is non-breaking.
Add last_updated as a publicly exposed field on both the dataset and version levels to be transparent about when any changes have been made.
Updated required fields on both the dataset and version models to match the minimal metadata standard. Validation rules may need to be updated to match.
Add a latest_edition link to the dataset links to aid users in navigating to the latest edition. This value can be set at the same time as the current latest_version is being set.
Merge the version model down into the editions endpoints such the the getting the editions list returns the latest version for each edition and getting a specific edition returns the latest version for that edition. This can be done with only one breaking change, which is the removal of the latest_version link in the editions endpoint responses. This change is deemed acceptable as the use of this link will no longer be necessary as the response is the latest version. This partially addresses one of the main complaints from users of the API, that the deep nesting and need to traverse down to the deepest level to get the download links is frustrating.
Add appropriate ETag and Cache-Control headers to all GET responses. The Cache-Control should prevent caching of all authenticated requests and set appropriate cache options for public requests.

[1] "never used" means that the field has never been populated and therefore has never been publicly visible nor has it been utilised by any of our internal systems

Outstanding considerations

While these changes cover the key points, the following still need to be addressed:

Standardisation and documentation of error responses
How to address user frustration that the dataset level response does not return the information about the latest version of the latest edition or a a list of the latest version of all editions. Further discussion is needed to decide how to address this.
How to address the user frustration that metadata is split between the dataset and version level. The metadata endpoint was created to solve this, but that requires an additional request and is also not available at the edition level so will negate some of the benefits of merging the version model down into the editions endpoints.
JSON-LD implementation and the development of an appropriate context file to aid in ingesting our API for those familiar with DCAT and the semantic web.

Summary of changes to bring the spec up to standard

The following changes were made to the API spec to bring it up to standard and inline with the current API implementation and therefore do not result in any API changes needing to be implemented:

Introduce linting for the API spec using redocly
All linting failures have been addressed
Formatting has been updated to be consistent
Several grammar and spelling errors have been fixed
All auth methods are updated to align to the current implementation
Tags have been rationalised to merge the two "Private" tag variants as the distinction was confusing and unnecessary. Descriptions have been added to the tags for clarity.
Removal of the observations endpoint as this was split into a separate API years ago
Add default host and scheme for the public API to the spec.
Clarify that the alert type is an enum by including the enum options and descriptions of the options
Add the missing dataset types to the dataset type enum inline with the current
Standardise related content links with a common model and indicate the required title and href fields
Correct the QMI link to a new model to highlight that it differs from the related content links and only has the href field and that the href is required
Add examples, defaults and min/max ranges to numerous fields to improve expectations and clarity for end users (focusing primarily on public endpoint fields that will be used by static type datasets)
Standardised the pagination fields with a common model to ensure consistency
Ensure the models used by GET endpoints have the correct required fields specified and that arrays that are required to be populated have a min length of 1
Update various descriptions (focusing on public endpoints) for clarity and accuracy
Add the format attribute to all date-time properties

How to review

Ensure the API spec is valid, clear and aligned to the minimal metadata model.

Try running the new spec linting locally to ensure it works correctly and finds no issues.

Who can review

!me

Update the API spec with correct auth methods as the spec is currently showing outdated auth information. Simplify the tags by merging the two different `private` tags into one. Added descriptions to the tags to make it clear what they are denoting. Also fix the general yaml formatting.

Added swagger spec linting using redocly. This has only been added for local use and not added to the CI job at this point as it requires node to be installed resulting in a combined go and node docker container being needed. This can be added to CI as a later change.

The observations endpoint is not part of the dataset API so should not be part of this API spec.

* Improve the dataset type enum by adding descriptions of the types and removing the unused types. Ensure all references use the common dataset type definition. * Improve the alert type enum to include the enum options and descriptions of the options * Standardise related content links with a common model and indicate the required title and href fields * Correct the QMI link to a new model to highlight that it differs from the related content links and only has the href field and that the href is required * Add examples, defaults and min/max ranges to numerous fields to improve expectations and clarity for end users * Standardise the pagination fields with a common model to ensure consistency * Remove the dataset level links.taxonomy link as it is currently always returning the incorrect information and should be removed * Improve the formatting of the deprecation notices in the descriptions of the deprecated fields at the dataset level * Ensure the GET endpoints at the dataset level have the correct required fields specified. * Remove unused uri field on the metadata endpoint * Add default host and scheme for the public API to the spec. * Remove unused links.access_rights from metadata endpoint.

Improve the descriptions by fixing typos, clarifying terminology and improving the clarity of language where possible for all public endpoints that will be used for static datasets. Add examples for all fields that will be publically used for static datasets to better illustrate to a user what is expected. Also add `format` attributes to date time string fields for clarity and to ensure an accurate example is rendered.

Add the new `distributions` array and `quality_designation` field to the versions model. These fields are being added to support the static datasets. `distributions` will replace the existing `downloads` object. The use of an array rather than a fixed object will provide the flexibility to add or remove supported formats over time. It also better aligns to the DCAT model. The `downloads` object will be deprecated, but should continue to be populated for CMD and Cantabular datasets in order to provide backwards compatibility until v2. `quality_designation` replaces the `national_statistic` boolean at the top level. This change is to allow the designation to be changed over time (e.g. an experimental dataset may recieve accreditation at a point in its version history, an accredited dataset could lose accredition over time, etc.). Backwards compatibility can be achieved by populating the dataset level `national_statistic` field with the latest version's designation value. If the `quality_designation` is accredited, then the `national_statistic` boolean should be `true`, else it should be `false`.

The `id` fields on the edition and version models is the internal DB ID and should not be exposed publicly.

Replace the currently unused `publisher` field (singular) with a `publishers` array to accommodate the ESS requirement of having multiple publishers. Also remove the unnecessary publisher `type` field.

Ensure the correct required properties are listed and set the correct min array lengths for the dataset and version models.

Update the API spec to merge the version model down into the edtions model such that the edition is simply the latest version of that edition. This partially addresses one of the main API user complaints whereby they are required to determine the latest version every time in order to get the distribution download URLs. As the edition model is so simple, this can be achieved while maintaining backwards compatability. Further work is needed to understand whether the user complaint can be fully addressed by returning the latest version of the latest edition at the dataset series level (i.e. `/v1/datasets/{id}`) to prevent the user from having to make that extra request when querying datasets that have multiple editions and especially those that frequently release new editions.

Revert the removal of `links.taxonomy` for now to prevent breaking changes, but add a deprecation notice to explain the issues with the field and that it will provide outdated and misleading information.

All GET responses must include a strong ETag header to enable proper cache functionality, enable implementation of `If-None-Match` functionality and enable `If-Match` for subsequent PUT, PATCH and DELETE requests. All GET responses must also include the appropriate `Cache-Control` header to ensure that authenticated requests are not cached, and that all public requests are cached appropriately.

janderson2 mentioned this pull request Feb 12, 2025

Move version and add quality designation and distribution fields #492

Merged

janderson2 force-pushed the feature/update-swagger-spec branch 2 times, most recently from 1ba371a to a8bd220 Compare February 12, 2025 17:28

janderson2 marked this pull request as ready for review February 12, 2025 17:32

janderson2 requested a review from a team as a code owner February 12, 2025 17:32

janderson2 force-pushed the feature/update-swagger-spec branch from a8bd220 to 2a1fb87 Compare February 12, 2025 17:37

janderson2 changed the title ~~Update API spec to show changes for~~ Update API spec to show changes for economic static type datasets Feb 17, 2025

franmoore05 approved these changes Feb 18, 2025

View reviewed changes

franmoore05 and others added 14 commits February 18, 2025 20:15

Updates to swagger spec following whiteboard discussion

ea8bb83

Remove observations endpoint as is different API

86ed1a3

The observations endpoint is not part of the dataset API so should not be part of this API spec.

Remove internal edition and version ID fields

afc0a4a

The `id` fields on the edition and version models is the internal DB ID and should not be exposed publicly.

Allow multiple publishers

ab8b503

Replace the currently unused `publisher` field (singular) with a `publishers` array to accommodate the ESS requirement of having multiple publishers. Also remove the unnecessary publisher `type` field.

Add last_updated and sort properties alphabetically

024f7bd

Update required dataset and version properties

782ca7e

Ensure the correct required properties are listed and set the correct min array lengths for the dataset and version models.

Revert removal of links.taxonomy

2067340

Revert the removal of `links.taxonomy` for now to prevent breaking changes, but add a deprecation notice to explain the issues with the field and that it will provide outdated and misleading information.

janderson2 force-pushed the feature/update-swagger-spec branch from 2a1fb87 to a50eccf Compare February 18, 2025 20:16

janderson2 merged commit a50eccf into develop Feb 18, 2025
7 checks passed

janderson2 deleted the feature/update-swagger-spec branch February 18, 2025 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update API spec to show changes for economic `static` type datasets #491

Update API spec to show changes for economic `static` type datasets #491

janderson2 commented Feb 11, 2025 •

edited

Loading

Update API spec to show changes for economic static type datasets #491

Update API spec to show changes for economic static type datasets #491

Conversation

janderson2 commented Feb 11, 2025 • edited Loading

What

Summary of proposed changes to the API

Outstanding considerations

Summary of changes to bring the spec up to standard

How to review

Who can review

Update API spec to show changes for economic `static` type datasets #491

Update API spec to show changes for economic `static` type datasets #491

janderson2 commented Feb 11, 2025 •

edited

Loading