Skip to content

v3.2: Guidance on searching and evaluating schemas #4743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: v3.2-dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 110 additions & 2 deletions src/oas.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ Using a `contentEncoding` of `base64url` ensures that URL encoding (as required

The `contentMediaType` keyword is redundant if the media type is already set:

* as the key for a [MediaType Object](#media-type-object)
* as the key for a [Media Type Object](#media-type-object)
* in the `contentType` field of an [Encoding Object](#encoding-object)

If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored.
Expand Down Expand Up @@ -1257,6 +1257,8 @@ See [Working With Examples](#working-with-examples) for further guidance regardi

This object MAY be extended with [Specification Extensions](#specification-extensions).

Note that correlating Encoding Objects with Schema Objects may require [schema searches](#searching-schemas) for keywords such as `properties`, `prefixItems`, and `items`.

See also the [Media Type Registry](#media-type-registry).

##### Complete vs Streaming Content
Expand Down Expand Up @@ -1639,7 +1641,7 @@ These fields MAY be used either with or without the RFC6570-style serialization

| Field Name | Type | Description |
| ---- | :----: | ---- |
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. |
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). The default value depends on the type (determined by a [schema search](#searching-schemas)) as shown in the table below. |
| <a name="encoding-headers"></a>headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. |

This object MAY be extended with [Specification Extensions](#specification-extensions).
Expand Down Expand Up @@ -2599,6 +2601,10 @@ Note that JSON Schema Draft 2020-12 does not require an `x-` prefix for extensio
The [`format` keyword (when using default format-annotation vocabulary)](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.2.1) and the [`contentMediaType`, `contentEncoding`, and `contentSchema` keywords](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-8.2) define constraints on the data, but are treated as annotations instead of being validated directly.
Extended validation is one way that these constraints MAY be enforced.

In addition to extended validation, annotations are the most effective way to determine whether these keywords impact the type and structure of the fully parsed data.
For example, formats such as `int64` can be applied to JSON strings, as JSON numbers have limitations that make large integers non-portable.
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes.
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and SHOULD document the limitations this imposes.

Copy link
Member

@karenetheridge karenetheridge Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(removed, commented in wrong section)


###### Validating `readOnly` and `writeOnly`

The `readOnly` and `writeOnly` keywords are annotations, as JSON Schema is not aware of how the data it is validating is being used.
Expand All @@ -2611,6 +2617,108 @@ Even when read-only fields are not required, stripping them is burdensome for cl

Note that the behavior of `readOnly` in particular differs from that specified by version 3.0 of this specification.

##### Working with Schemas

In addition to schema evaluation, which encompasses both validation and annotation, some OAS features require inspecting schemas in other ways.

###### Preparing Data for Schema Evaluation

When the data source is a JSON document, preparing the data is trivial as parsing JSON produces a suitable data structure.
Some other media types, as well as URL components and header values, lack sufficient type information to parse directly to suitable data types.

Consider this URL-encoded form:

```uri
foo=42&bar=42
```

As URL query parameters are strings, this would naturally parse to something equivalent to the following JSON:

```json
{
"foo": "42",
"bar": "42"
}
```

But consider this [Media Type Object](#media-type-object) for the form:

```yaml
application/x-www-form-urlencoded:
schema:
type: object
properties:
foo:
type: string
bar:
type: integer
```

From the `schema` field, we can tell that the correct data structure would actually be equivalent to:

```json
{
"foo": "42",
"bar": 42
}
```

In order to prepare the correct data structure for evaluation in such cases, implementations MUST perform a [schema search](#searching-schemas) for the `type` keyword.

###### Applying Further Type Information

The `format` keyword provides more fine-grained type information, and can even change the underlying data type for the purposes of the application.
For example, if `foo` had the schema `{"type": "string", "format": "int64")`, the data structure used for validation would still be the same, but the application will need to convert the string `"42"` to the 64-bit integer `42`.
Similarly, the `content*` keywords can indicate further structure within a string.

Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements.
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and SHOULD document which approach it implements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting, if implementations don't do this, what would they do instead? If there isn't anything they can do, then I think the MUST would stand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really did not expect this PR to get hung up on a debate about how much to require implementations to document their behavior. Which I thought would be thoroughly non-controversial. Why would we not want them to do so?

So... I have no idea. I want everyone else to resolve their differences around documentation requirements so it doesn't hang up this PR, that's my opinion on the matter.


Note that parsing string contents based on `contentMediaType` carries the same security risks as parsing HTTP message bodies based on `Content-Type`; see [Handling External Resources](#handling-external-resources) for further information.

###### Schema Evaluation and Binary Data

Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification.

OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data).
When the entire instance is binary, this is straightforward as few keywords are relevant.

However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations:

1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords).
2. Perform [schema searches](#searching-schemas) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data.

Implementations MUST document which strategy or strategies they use, as well as any known limitations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementations MUST document which strategy or strategies they use, as well as any known limitations.
Implementations SHOULD document which strategy or strategies they use, as well as any known limitations.


##### Searching Schemas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about moving this section a little further up the document, who has thoughts?


Several OAS features require searching Schema Objects for keywords indicating the data type and/or structure.
Each feature that needs such a search documents which keywords or structures need to be found.

Even if the requirement is given in terms of schema keywords, if the data is in a form [suitable for schema evaluation](#preparing-data-for-schema-evaluation) and the necessary information (including type) can be determined by inspecting the data (and possibly also annotations such as `format`), implementations MUST support doing so as this is effective regardless of how schemas are structured.

If this is not possible, the schemas MUST be searched to see if the information can be determined without performing evaluation.
As schema organization can become very complex, implementations are not expected to handle every possible schema layout.
However, given a known starting point schema (usually the value of the nearest `schema` field), implementations MUST search the following for the relevant keywords, which vary depending on the use case but might include `type`, `format`, `contentMediaType`, `properties`, `prefixItems`, `items`, etc.:

* The starting point schema itself
* Any schema reachable from there solely through `$ref` and/or `allOf`

These schemas are guaranteed to be applied to any instance.

In some cases, such as correlating [Encoding Objects](#encoding-object) with Schema Objects using fields in a [Media Type Object](#media-type-object), it is be necessary to first find a keyword such as `properties`, and then treat its subschema(s) as starting point schemas for further searches.

Implementations MAY analyze subschemas of other keywords such as `oneOf` or `dependentSchemas`, or examine possible `$dynamicRef` targets, and MUST document the extent and nature of any such additional support.

###### Handling Multiple Types

When searching for `type`, if the `type` keyword has multiple values, one of which is `"null"` (e.g. `type: ["number", "null"]`), the non-null type MUST be treated as the relevant type if a single type is needed to determine behavior.

For other multi-valued `type` keywords, the behavior is implementation-defined but MUST either follow a documented process or be documented to produce an informative error.

If an implementation supports handling multi-valued `type` keywords for type searches, it SHOULD attempt to use non-`"string"` types before using `"string"` (if `"string"` is one of the types) as all current type interpretation use cases involve data stored in string form by default.

Implementations MAY treat the order of types in the `type` keyword as significant, except when it conflicts with the above requirements.

##### Data Modeling Techniques

###### Composition and Inheritance (Polymorphism)
Expand Down