Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: enable an optional limit at which the generated annotation for a type is Any #202

Open
sirosen opened this issue Jul 10, 2020 · 3 comments

Comments

@sirosen
Copy link

sirosen commented Jul 10, 2020

When generating annotations, it would be really handy to be able to tell monkey type that if it's going to generate an enormous compound type, I'd like it to just "give up" and use Any.
This is a problem I've run into several times, using monkeytype to add annotations to an existing codebase.

It shows up especially when handling data de/serialization. For example:

# input
def dump(some_json_data):
    return json.dumps(some_json_data)

# output, after a large testsuite run
def dump(some_json_data:  Dict[
    str,
    Union[
        str,
        List[
            Union[
                Dict[
                    str,
                    Union[
                        str,
                        Dict[
                            str,
                            Union[
                                str,
                                List[Dict[str, Union[str, List[str]]]],
                                List[Dict[str, Union[int, str]]],
                            ],
                        ],
                    ],
                ],
                Dict[
                    str,
                    Union[str, Dict[str, Union[str, List[Dict[str, Union[int, str]]]]]],
                ],
                Dict[
                    str,
                    Union[
                        str,
                        Dict[
                            str,
                            Union[
                                str,
                                List[Dict[str, Union[str, List[str]]]],
                                List[
                                    Union[
                                        Dict[str, Union[int, str]],
                                        Dict[str, str],
                                        Dict[str, Union[str, int, Dict[str, int]]],
                                    ]
                                ],
                            ],
                        ],
                    ],
                ],
                Dict[
                    str,
                    Union[
                        str,
                        Dict[
                            str,
                            Union[
                                str,
                                List[Dict[str, Union[str, List[str]]]],
                                List[Union[Dict[str, Union[int, str]], Dict[str, str]]],
                            ],
                        ],
                    ],
                ],
            ]
        ],
    ],
]) -> str:
   return json.dumps(some_json_data)

I think the ideal output in this case is

def dump(some_json_data:  Dict[str, Any]) -> str:
    return json.dumps(some_json_data)

However, it's probably not reasonable to ask that monkeytype do this perfectly.

Two good options...

  1. give up entirely about the type of the structure: def dump(some_json_data: Any) -> str: ...
  2. have some maximum depth for these scenarios, e.g. def dump(some_json_data: Dict[str, Union[str, List, Dict]]) -> str: ...

On the detection side of things, I think you can just count the number of levels of nesting or the total number of compounded types (i.e. count of Dict, str, Union, etc. types in the generated annotation).

In terms of usage, I want to be able to say something like monkeytype run --compound-type-limit 10 -m pytest

@carljm
Copy link
Contributor

carljm commented Jul 10, 2020

Hi, thanks for the report! So the default type rewriter in the default config already includes RewriteLargeUnion which is intended to address this problem. However, it defaults to rewriting any Union with more than 5 elements, and it looks like your particular type escapes this rewrite by being deeply nested but not having a Union of more than 5 elements at any one layer.

In principle I think it should be possible to write a type rewriter that in some way also collapses based on depth of nesting, though I think identifying the right heuristic precisely might be tricky; it's not clear that a deeply-nested but well-specified type should necessarily be collapsed (it might be better to e.g. extract it to a reusable type alias), but deep nesting with lots of unions at multiple layers probably should be.

Given that MonkeyType's type-rewriter and config infrastructure allows you to easily do all of this outside of MonkeyType in your own config file, I would encourage you to experiment with writing a type-rewriter to do this and seeing how it behaves in practice on your codebases. If you reach a solution that you feel works well and want to submit it as a pull request, I'd certainly consider it.

FWIW, at Instagram we ended up disabling even RewriteLargeUnion. Our approach to using MonkeyType is that its purpose is to generate an informative first draft suitable for editing by a developer before committing. Given that it's relatively hard for a developer to find information that type rewriters choose to discard, but relatively easy to delete multiple lines of type annotation, we ended up feeling that it was preferable for MonkeyType to just provide all the information available and allow the developer to make the choice of how to best condense it, since with their understanding of the code context they can make a better choice than an automated type rewriter.

@sirosen
Copy link
Author

sirosen commented Jul 10, 2020

No guarantee I get to it right away, but I'll make a note to give a custom rewriter a try. I wasn't aware of that functionality at all before. I'll open a PR with it if I come up with something good.

One thing I noticed about the type I provided, now that I know to be looking, is that there are cases in which the type could be rewritten to equivalent things which would trigger RewriteLargeUnion.
Specifically, consider this case:

type rewrite to "push Unions down the tree"
# original -- max Union arity = 4
Union[
    Dict[
        str,
        Union[
            str,
            Dict[
                str,
                Union[
                    str,
                    List[Dict[str, Union[str, List[str]]]],
                    List[Dict[str, Union[int, str]]],
                ],
            ],
        ],
    ],
    Dict[
        str,
        Union[str, Dict[str, Union[str, List[Dict[str, Union[int, str]]]]]],
    ],
    Dict[
        str,
        Union[
            str,
            Dict[
                str,
                Union[
                    str,
                    List[Dict[str, Union[str, List[str]]]],
                    List[
                        Union[
                            Dict[str, Union[int, str]],
                            Dict[str, str],
                            Dict[str, Union[str, int, Dict[str, int]]],
                        ]
                    ],
                ],
            ],
        ],
    ],
    Dict[
        str,
        Union[
            str,
            Dict[
                str,
                Union[
                    str,
                    List[Dict[str, Union[str, List[str]]]],
                    List[Union[Dict[str, Union[int, str]], Dict[str, str]]],
                ],
            ],
        ],
    ],
]

# rewrite -- max Union arity = 5
Dict[
    str,
    Union[
        str,
        Dict[
            str,
            Union[
                str,
                List[Dict[str, Union[str, List[str]]]],
                List[Dict[str, Union[int, str]]],
            ],
        ],
        Dict[str, Union[str, List[Dict[str, Union[int, str]]]]],
        Dict[
            str,
            Union[
                str,
                List[Dict[str, Union[str, List[str]]]],
                List[
                    Union[
                        Dict[str, Union[int, str]],
                        Dict[str, str],
                        Dict[str, Union[str, int, Dict[str, int]]],
                    ]
                ],
            ],
        ],
        Dict[
            str,
            Union[
                str,
                Dict[
                    str,
                    Union[
                        str,
                        List[Dict[str, Union[str, List[str]]]],
                        List[Union[Dict[str, Union[int, str]], Dict[str, str]]],
                    ],
                ],
            ],
        ],
    ]
]

More generally, it seems that rewriting Union[Dict[T, X], Dict[T, Y]] as Dict[T, Union[X, Y]] might give you more hits for RewriteLargeUnion (or there may be scenarios in which it would give you fewer).

The other interesting case is nested Unions which could be "flattened". This feels like going in the other direction, letting Unions "bubble up" a type tree. That would potentially give more large unions, and also leads to some deduplication.
e.g.

flattening nested Union example
# original -- max Union arity = 3
Union[
    str,
    List[Dict[str, Union[str, List[str]]]],
    List[Union[Dict[str, Union[int, str]], Dict[str, str]]],
]

# rewrite -- max Union arity = 4
Union[
    str,
    List[Dict[str, str]],
    List[Dict[str, List[str]]],
    List[Dict[str, int]],
]

# extra rewrite -- how would you decide to do this?
# results in the shortest type by number of characters
Union[
    str,
    List[Dict[str, Union[str, List[str], int]]],
]

Maybe the way to think of this is that a union in a generic can be "unpacked". Union[T, Generic[Union[X, Y]]] matches Union[T, Generic[X], Generic[Y]].
This feels more useful, but I suspect there are degenerate cases in which it would result in a single massive Union at the root of a type tree.

@carljm
Copy link
Contributor

carljm commented Jul 10, 2020

We actually already have RewriteConfigDict rewriter also in the default config, which should do precisely the Union[Dict[T, X], Dict[T, Y]] -> Dict[T, Union[X, Y]] transform in your first example. Interesting that there seem to be a couple deeply-nested cases in your example annotation where that should have applied but didn't. Maybe there's a bug in it that's causing it to not fully traverse the nested types, or maybe it's a case where we'd have had to repeatedly apply the transform until a fixed point, which we don't currently do.

The transforms in your second example look significantly harder to pull off algorithmically, but are probably possible; if you come up with a working rewriter we'd definitely look at it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants