Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal recursive references support #69

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Stranger6667
Copy link
Contributor

@Stranger6667 Stranger6667 commented Nov 18, 2020

Based on #61 but doesn't include explicit data generation for recursive references.

I will update this comment with the results of different commits to track the progress in a more structured way. Also, I will use the test_can_generate_for_real_large_schema test schemas from RECURSIVE_REFS to check how many tests are working or not (and why). There are 39 schemas with recursive references in this test, excluding one xfail about Draft 3.

Step 1. Skipping recursive references in resolve_all_refs

Based on your comment in #33 and discussions in #61, I made resolve_all_refs to stop resolving recursive references. At this point, it still inlines schemas up to the point where a recursive reference occurred. Tests breakdown:

11/39 - passed
15/39 - failed due to reference resolving (Unresolvable JSON pointer: )
2/39 - recursion error inside the strategy
11/39 - stuck (I waited for a couple of minutes this time)

The recursion error lead me to the following jsonschema behavior:

>>> jsonschema.validate(None, {"$ref": "#"})
RecursionError: maximum recursion depth exceeded

Which occurred in the Meta-validation schema for JSON Schema Draft 4. Here is a minimized version to illustrate this behavior:

{
    "properties": {
        "additionalItems": {
            "anyOf": [{"type": "boolean"}, {"$ref": "#"}],
        }
    },
    "type": "object"
}

Still, it goes to the category where we can't resolve a reference properly - here, it should point to the root schema, but in merged_as_strategies we have subschemas and validators inside this function don't see the parent schema, and this error happens. Having a proper reference resolver there should solve the problem.

I also not sure if it should be somehow addressed on the jsonschema level, the behavior seems undefined for that simplistic case (I still will handle it in my Rust library, since it causes a segfault on the Python bindings level, maybe will catch recursion level and return some error)

Step 2. Reduce inlining in resolve_all_refs

Based on your comment, resolve_all_refs stops inlining the whole branch, but at the moment, it doesn't remove the inlined reference (will take a look at it later). Tests breakdown:

18/39 - passed
1/39 - an example is generated but fails validation (Avro Schema Avsc file)
20/39 - failed due to reference resolving (Unresolvable JSON pointer: )

And no schemas stuck!

Step 3. Pass resolver everywhere where jsonschema validators are used

Most of the previous step errors are related to reference resolving - validators contain sub-schemas that are referring to their ancestors and can't find them. This change adds a resolver that contains the top-level schema, and these validators can resolve references in most cases. There are few exceptions, though - tests from the JSON Schema test suite - "remote ref, containing refs itself" and "valid definition" for Drafts 4 & 7.

There are only 6 failing tests:

  • 4 errors mentioned above, not sure why it is failing yet
  • Avro Schema Avsc file - there is a canonicalisation error, that leads to a wrong type (oneOf + recursive reference)
  • Web component file - seems like a missing type as well for {'$ref': '#'} kind of references

Some schemas are quite slow, but almost everything is working now :)

@Stranger6667 Stranger6667 force-pushed the dd/minimal-recursive-references branch 2 times, most recently from 4896ec3 to 2f1dbeb Compare November 19, 2020 15:38
This was referenced Nov 20, 2020
@buchi-busireddy
Copy link

@Stranger6667 This seems like a useful feature and we also have some use cases where this could come in handy. How complete is this PR? Just curious, do you (or anyone) plan to work on this or are there any other updates that make this unnecessary?

@Stranger6667
Copy link
Contributor Author

@buchi-busireddy

The plan is to refactor the canonicalization process and carry some context around as it is needed to detect the used draft as well. So, this PR is not that relevant structurally, but likely will be useful for such refactoring. It is still needed for recursive reference support though.
At the moment it is far away on my todo list, and I am not sure when I'll be able to work on this direction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants