-
Notifications
You must be signed in to change notification settings - Fork 9.1k
v3.2: Guidance on searching and evaluating schemas #4743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.2-dev
Are you sure you want to change the base?
Changes from all commits
4c3c8b1
a3db2bb
0912400
6290e79
fa12074
7928dbe
e446e40
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -309,7 +309,7 @@ Using a `contentEncoding` of `base64url` ensures that URL encoding (as required | |||||
|
||||||
The `contentMediaType` keyword is redundant if the media type is already set: | ||||||
|
||||||
* as the key for a [MediaType Object](#media-type-object) | ||||||
* as the key for a [Media Type Object](#media-type-object) | ||||||
* in the `contentType` field of an [Encoding Object](#encoding-object) | ||||||
|
||||||
If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored. | ||||||
|
@@ -1257,6 +1257,8 @@ See [Working With Examples](#working-with-examples) for further guidance regardi | |||||
|
||||||
This object MAY be extended with [Specification Extensions](#specification-extensions). | ||||||
|
||||||
Note that correlating Encoding Objects with Schema Objects may require [schema searches](#searching-schemas) for keywords such as `properties`, `prefixItems`, and `items`. | ||||||
|
||||||
See also the [Media Type Registry](#media-type-registry). | ||||||
|
||||||
##### Complete vs Streaming Content | ||||||
|
@@ -1639,7 +1641,7 @@ These fields MAY be used either with or without the RFC6570-style serialization | |||||
|
||||||
| Field Name | Type | Description | | ||||||
| ---- | :----: | ---- | | ||||||
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. | | ||||||
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). The default value depends on the type (determined by a [schema search](#searching-schemas)) as shown in the table below. | | ||||||
| <a name="encoding-headers"></a>headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. | | ||||||
|
||||||
This object MAY be extended with [Specification Extensions](#specification-extensions). | ||||||
|
@@ -2599,6 +2601,10 @@ Note that JSON Schema Draft 2020-12 does not require an `x-` prefix for extensio | |||||
The [`format` keyword (when using default format-annotation vocabulary)](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.2.1) and the [`contentMediaType`, `contentEncoding`, and `contentSchema` keywords](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-8.2) define constraints on the data, but are treated as annotations instead of being validated directly. | ||||||
Extended validation is one way that these constraints MAY be enforced. | ||||||
|
||||||
In addition to extended validation, annotations are the most effective way to determine whether these keywords impact the type and structure of the fully parsed data. | ||||||
For example, formats such as `int64` can be applied to JSON strings, as JSON numbers have limitations that make large integers non-portable. | ||||||
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes. | ||||||
|
||||||
###### Validating `readOnly` and `writeOnly` | ||||||
|
||||||
The `readOnly` and `writeOnly` keywords are annotations, as JSON Schema is not aware of how the data it is validating is being used. | ||||||
|
@@ -2611,6 +2617,108 @@ Even when read-only fields are not required, stripping them is burdensome for cl | |||||
|
||||||
Note that the behavior of `readOnly` in particular differs from that specified by version 3.0 of this specification. | ||||||
|
||||||
##### Working with Schemas | ||||||
|
||||||
In addition to schema evaluation, which encompasses both validation and annotation, some OAS features require inspecting schemas in other ways. | ||||||
|
||||||
###### Preparing Data for Schema Evaluation | ||||||
|
||||||
When the data source is a JSON document, preparing the data is trivial as parsing JSON produces a suitable data structure. | ||||||
Some other media types, as well as URL components and header values, lack sufficient type information to parse directly to suitable data types. | ||||||
|
||||||
Consider this URL-encoded form: | ||||||
|
||||||
```uri | ||||||
foo=42&bar=42 | ||||||
``` | ||||||
|
||||||
As URL query parameters are strings, this would naturally parse to something equivalent to the following JSON: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"foo": "42", | ||||||
"bar": "42" | ||||||
} | ||||||
``` | ||||||
|
||||||
But consider this [Media Type Object](#media-type-object) for the form: | ||||||
|
||||||
```yaml | ||||||
application/x-www-form-urlencoded: | ||||||
schema: | ||||||
type: object | ||||||
properties: | ||||||
foo: | ||||||
type: string | ||||||
bar: | ||||||
type: integer | ||||||
``` | ||||||
|
||||||
From the `schema` field, we can tell that the correct data structure would actually be equivalent to: | ||||||
|
||||||
```json | ||||||
{ | ||||||
"foo": "42", | ||||||
"bar": 42 | ||||||
} | ||||||
``` | ||||||
|
||||||
In order to prepare the correct data structure for evaluation in such cases, implementations MUST perform a [schema search](#searching-schemas) for the `type` keyword. | ||||||
|
||||||
###### Applying Further Type Information | ||||||
|
||||||
The `format` keyword provides more fine-grained type information, and can even change the underlying data type for the purposes of the application. | ||||||
For example, if `foo` had the schema `{"type": "string", "format": "int64")`, the data structure used for validation would still be the same, but the application will need to convert the string `"42"` to the 64-bit integer `42`. | ||||||
Similarly, the `content*` keywords can indicate further structure within a string. | ||||||
|
||||||
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in the meeting, if implementations don't do this, what would they do instead? If there isn't anything they can do, then I think the MUST would stand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really did not expect this PR to get hung up on a debate about how much to require implementations to document their behavior. Which I thought would be thoroughly non-controversial. Why would we not want them to do so? So... I have no idea. I want everyone else to resolve their differences around documentation requirements so it doesn't hang up this PR, that's my opinion on the matter. |
||||||
|
||||||
Note that parsing string contents based on `contentMediaType` carries the same security risks as parsing HTTP message bodies based on `Content-Type`; see [Handling External Resources](#handling-external-resources) for further information. | ||||||
|
||||||
###### Schema Evaluation and Binary Data | ||||||
|
||||||
Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification. | ||||||
|
||||||
OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data). | ||||||
When the entire instance is binary, this is straightforward as few keywords are relevant. | ||||||
|
||||||
However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations: | ||||||
|
||||||
1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords). | ||||||
handrews marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
2. Perform [schema searches](#searching-schemas) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data. | ||||||
|
||||||
Implementations MUST document which strategy or strategies they use, as well as any known limitations. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
##### Searching Schemas | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question about moving this section a little further up the document, who has thoughts? |
||||||
|
||||||
Several OAS features require searching Schema Objects for keywords indicating the data type and/or structure. | ||||||
Each feature that needs such a search documents which keywords or structures need to be found. | ||||||
|
||||||
Even if the requirement is given in terms of schema keywords, if the data is in a form [suitable for schema evaluation](#preparing-data-for-schema-evaluation) and the necessary information (including type) can be determined by inspecting the data (and possibly also annotations such as `format`), implementations MUST support doing so as this is effective regardless of how schemas are structured. | ||||||
|
||||||
If this is not possible, the schemas MUST be searched to see if the information can be determined without performing evaluation. | ||||||
As schema organization can become very complex, implementations are not expected to handle every possible schema layout. | ||||||
However, given a known starting point schema (usually the value of the nearest `schema` field), implementations MUST search the following for the relevant keywords, which vary depending on the use case but might include `type`, `format`, `contentMediaType`, `properties`, `prefixItems`, `items`, etc.: | ||||||
|
||||||
* The starting point schema itself | ||||||
* Any schema reachable from there solely through `$ref` and/or `allOf` | ||||||
|
||||||
These schemas are guaranteed to be applied to any instance. | ||||||
|
||||||
In some cases, such as correlating [Encoding Objects](#encoding-object) with Schema Objects using fields in a [Media Type Object](#media-type-object), it is be necessary to first find a keyword such as `properties`, and then treat its subschema(s) as starting point schemas for further searches. | ||||||
|
||||||
Implementations MAY analyze subschemas of other keywords such as `oneOf` or `dependentSchemas`, or examine possible `$dynamicRef` targets, and MUST document the extent and nature of any such additional support. | ||||||
|
||||||
###### Handling Multiple Types | ||||||
|
||||||
When searching for `type`, if the `type` keyword has multiple values, one of which is `"null"` (e.g. `type: ["number", "null"]`), the non-null type MUST be treated as the relevant type if a single type is needed to determine behavior. | ||||||
|
||||||
For other multi-valued `type` keywords, the behavior is implementation-defined but MUST either follow a documented process or be documented to produce an informative error. | ||||||
|
||||||
If an implementation supports handling multi-valued `type` keywords for type searches, it SHOULD attempt to use non-`"string"` types before using `"string"` (if `"string"` is one of the types) as all current type interpretation use cases involve data stored in string form by default. | ||||||
|
||||||
Implementations MAY treat the order of types in the `type` keyword as significant, except when it conflicts with the above requirements. | ||||||
|
||||||
##### Data Modeling Techniques | ||||||
|
||||||
###### Composition and Inheritance (Polymorphism) | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(removed, commented in wrong section)