Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API spec to show changes for economic static type datasets #491

Merged
merged 14 commits into from
Feb 18, 2025

Conversation

janderson2
Copy link
Contributor

@janderson2 janderson2 commented Feb 11, 2025

What

Update the public endpoints in the API spec (aka swagger spec) to be inline with the current API behaviour and then layer the planned changes to the API to support the required model for the static datasets on top.

Further work will be required to review the write endpoints to ensure they align where the models used differ.

Linting has been introduced for the API spec. This is only currently configured to be run locally and has not been added to the CI job yet.

Summary of proposed changes to the API

The following are the changes proposed to be made to the API to support the current work to add static datasets to the catalogue:

  • Rename the newly added themes array which is a list of taxonomy topics for the dataset to topics to be inline with common terminology across the API to be more intuitive for users
  • Remove the unused theme field as it was never used [1] and could cause confusion with topics for users who are familiar with DCAT terminology
  • Remove the unused uri field which was intended to point to the dataset landing page as it was never used [1]
  • Remove the access_rights link from the dataset links as this field has never been used and could be confused with the DCAT term.
  • Remove cantabular_table, cantabular_blob and nomis types from the dataset type enum as they have never been [1] and will not be used.
  • Add the static dataset type to the dataset type enum
  • Add distributions list which will replace the downloads map to enable a wider range of file types to be published as well as allowing the additional metadata to be provided. This will better align with DCAT and provide the necessary flexibility for the various formats of static dataset published. Backwards compatibility can be provided for CMD and Cantabular datasets by continuing to populate the downloads map, while also populating the new distributions list. static type datasets should not populate the deprecated downloads map.
  • Add quality_designation field at the Version level. This field replaces the current national_statistic boolean at the dataset level. This change is to support the 3 current designations a dataset can have Official, Accredited Official and Official in Development as well as to represent change in designation over time (e.g. a developmental dataset may become accredited or an accredited dataset may lose its accreditation). Backwards compatibility can be achieved by setting the national_statistic boolean to true when updating the latest_version link if the latest version is accredited-official, otherwise setting the boolean to false.
  • Remove the internal id fields from editions and versions as these are the UUID database IDs and should not be exposed publicly.
  • Replace the singular publisher field which has never been used [1] with a new publishers array field to support the requirement to be able to have multiple publishers for a dataset. This field will be defaulted to a single publisher of ONS if not specified.
  • Remove the unnecessary type field from the publisher. As mentioned above publisher has never been used [1] so this is non-breaking.
  • Add last_updated as a publicly exposed field on both the dataset and version levels to be transparent about when any changes have been made.
  • Updated required fields on both the dataset and version models to match the minimal metadata standard. Validation rules may need to be updated to match.
  • Add a latest_edition link to the dataset links to aid users in navigating to the latest edition. This value can be set at the same time as the current latest_version is being set.
  • Merge the version model down into the editions endpoints such the the getting the editions list returns the latest version for each edition and getting a specific edition returns the latest version for that edition. This can be done with only one breaking change, which is the removal of the latest_version link in the editions endpoint responses. This change is deemed acceptable as the use of this link will no longer be necessary as the response is the latest version. This partially addresses one of the main complaints from users of the API, that the deep nesting and need to traverse down to the deepest level to get the download links is frustrating.
  • Add appropriate ETag and Cache-Control headers to all GET responses. The Cache-Control should prevent caching of all authenticated requests and set appropriate cache options for public requests.

[1] "never used" means that the field has never been populated and therefore has never been publicly visible nor has it been utilised by any of our internal systems

Outstanding considerations

While these changes cover the key points, the following still need to be addressed:

  1. Standardisation and documentation of error responses
  2. How to address user frustration that the dataset level response does not return the information about the latest version of the latest edition or a a list of the latest version of all editions. Further discussion is needed to decide how to address this.
  3. How to address the user frustration that metadata is split between the dataset and version level. The metadata endpoint was created to solve this, but that requires an additional request and is also not available at the edition level so will negate some of the benefits of merging the version model down into the editions endpoints.
  4. JSON-LD implementation and the development of an appropriate context file to aid in ingesting our API for those familiar with DCAT and the semantic web.

Summary of changes to bring the spec up to standard

The following changes were made to the API spec to bring it up to standard and inline with the current API implementation and therefore do not result in any API changes needing to be implemented:

  • Introduce linting for the API spec using redocly
  • All linting failures have been addressed
  • Formatting has been updated to be consistent
  • Several grammar and spelling errors have been fixed
  • All auth methods are updated to align to the current implementation
  • Tags have been rationalised to merge the two "Private" tag variants as the distinction was confusing and unnecessary. Descriptions have been added to the tags for clarity.
  • Removal of the observations endpoint as this was split into a separate API years ago
  • Add default host and scheme for the public API to the spec.
  • Clarify that the alert type is an enum by including the enum options and descriptions of the options
  • Add the missing dataset types to the dataset type enum inline with the current
  • Standardise related content links with a common model and indicate the required title and href fields
  • Correct the QMI link to a new model to highlight that it differs from the related content links and only has the href field and that the href is required
  • Add examples, defaults and min/max ranges to numerous fields to improve expectations and clarity for end users (focusing primarily on public endpoint fields that will be used by static type datasets)
  • Standardised the pagination fields with a common model to ensure consistency
  • Ensure the models used by GET endpoints have the correct required fields specified and that arrays that are required to be populated have a min length of 1
  • Update various descriptions (focusing on public endpoints) for clarity and accuracy
  • Add the format attribute to all date-time properties

How to review

Ensure the API spec is valid, clear and aligned to the minimal metadata model.

Try running the new spec linting locally to ensure it works correctly and finds no issues.

Who can review

!me

@janderson2 janderson2 force-pushed the feature/update-swagger-spec branch 2 times, most recently from 1ba371a to a8bd220 Compare February 12, 2025 17:28
@janderson2 janderson2 marked this pull request as ready for review February 12, 2025 17:32
@janderson2 janderson2 requested a review from a team as a code owner February 12, 2025 17:32
@janderson2 janderson2 force-pushed the feature/update-swagger-spec branch from a8bd220 to 2a1fb87 Compare February 12, 2025 17:37
@janderson2 janderson2 changed the title Update API spec to show changes for Update API spec to show changes for economic static type datasets Feb 17, 2025
franmoore05 and others added 14 commits February 18, 2025 20:15
Update the API spec with correct auth methods as the spec is currently
showing outdated auth information.

Simplify the tags by merging the two different `private` tags into one.
Added descriptions to the tags to make it clear what they are denoting.

Also fix the general yaml formatting.
Added swagger spec linting using redocly. This has only been added for
local use and not added to the CI job at this point as it requires node
to be installed resulting in a combined go and node docker container
being needed. This can be added to CI as a later change.
The observations endpoint is not part of the dataset API so should not
be part of this API spec.
* Improve the dataset type enum by adding descriptions of the types and
  removing the unused types. Ensure all references use the common
  dataset type definition.
* Improve the alert type enum to include the enum options and
  descriptions of the options
* Standardise related content links with a common model and indicate the
  required title and href fields
* Correct the QMI link to a new model to highlight that it differs from
  the related content links and only has the href field and that the
  href is required
* Add examples, defaults and min/max ranges to numerous fields to
  improve expectations and clarity for end users
* Standardise the pagination fields with a common model to ensure
  consistency
* Remove the dataset level links.taxonomy link as it is currently always
  returning the incorrect information and should be removed
* Improve the formatting of the deprecation notices in the descriptions
  of the deprecated fields at the dataset level
* Ensure the GET endpoints at the dataset level have the correct
  required fields specified.
* Remove unused uri field on the metadata endpoint
* Add default host and scheme for the public API to the spec.
* Remove unused links.access_rights from metadata endpoint.
Improve the descriptions by fixing typos, clarifying terminology and
improving the clarity of language where possible for all public
endpoints that will be used for static datasets.

Add examples for all fields that will be publically used for static
datasets to better illustrate to a user what is expected. Also add
`format` attributes to date time string fields for clarity and to ensure
an accurate example is rendered.
Add the new `distributions` array and `quality_designation` field to the
versions model. These fields are being added to support the static
datasets.

`distributions` will replace the existing `downloads` object. The use of
an array rather than a fixed object will provide the flexibility to add
or remove supported formats over time. It also better aligns to the DCAT
model. The `downloads` object will be deprecated, but should continue to
be populated for CMD and Cantabular datasets in order to provide
backwards compatibility until v2.

`quality_designation` replaces the `national_statistic` boolean at the
top level. This change is to allow the designation to be changed over
time (e.g. an experimental dataset may recieve accreditation at a point
in its version history, an accredited dataset could lose accredition
over time, etc.). Backwards compatibility can be achieved by populating
the dataset level `national_statistic` field with the latest version's
designation value. If the `quality_designation` is accredited, then the
`national_statistic` boolean should be `true`, else it should be
`false`.
The `id` fields on the edition and version models is the internal DB ID
and should not be exposed publicly.
Replace the currently unused `publisher` field (singular) with a
`publishers` array to accommodate the ESS requirement of having multiple
publishers.

Also remove the unnecessary publisher `type` field.
Ensure the correct required properties are listed and set the correct
min array lengths for the dataset and version models.
Update the API spec to merge the version model down into the edtions
model such that the edition is simply the latest version of that
edition. This partially addresses one of the main API user complaints
whereby they are required to determine the latest version every time in
order to get the distribution download URLs.

As the edition model is so simple, this can be achieved while
maintaining backwards compatability.

Further work is needed to understand whether the user complaint can be
fully addressed by returning the latest version of the latest edition at
the dataset series level (i.e. `/v1/datasets/{id}`) to prevent the user
from having to make that extra request when querying datasets that have
multiple editions and especially those that frequently release new
editions.
Revert the removal of `links.taxonomy` for now to prevent breaking
changes, but add a deprecation notice to explain the issues with the
field and that it will provide outdated and misleading information.
All GET responses must include a strong ETag header to enable proper
cache functionality, enable implementation of `If-None-Match`
functionality and enable `If-Match` for subsequent PUT, PATCH and DELETE
requests.

All GET responses must also include the appropriate `Cache-Control`
header to ensure that authenticated requests are not cached, and that
all public requests are cached appropriately.
@janderson2 janderson2 force-pushed the feature/update-swagger-spec branch from 2a1fb87 to a50eccf Compare February 18, 2025 20:16
@janderson2 janderson2 merged commit a50eccf into develop Feb 18, 2025
7 checks passed
@janderson2 janderson2 deleted the feature/update-swagger-spec branch February 18, 2025 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants