Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: relative cross-references #28

Closed

Conversation

analog-cbarber
Copy link

This fully implements the proposed feature.

Fixes #27

This fully implements the proposed feature.

Fixes mkdocstrings#27
Fixes mkdocstrings#27

This commit fixes:
 - the regular expression for checking valid python identifiers
 - update of line offset into doc string for error messages
@analog-cbarber
Copy link
Author

I don't think this implementation is actually correct in any case, at least where it is invoked in the collect method, but it would be nice to get some feedback on the relative cross reference syntax.

@analog-cbarber analog-cbarber marked this pull request as draft June 26, 2022 17:03
@analog-cbarber
Copy link
Author

If we were to do something like this, it probably would be better to do the substitution either at the end of the collect method or at the beginning of the render method. For instance, it appears that I can make my own extension to the python handler doing something like this:

class PythonPlusHandler(PythonHandler):
    """Extended version of mkdocstrings Python handler"""
    def render(self, data: Docstring, config: dict) -> str:
        if config.get("relative_crossrefs", True):
            substitute_relative_crossrefs_in_docstring(data)
        return super().render(data, config)


    def get_templates_dir(self, handler: str) -> Path:
        if handler == 'python-plus':
            handler = 'python'
        return super().get_templates_dir(handler)

@analog-cbarber
Copy link
Author

It seems that Griffe is not producing consistent line numbers for doc strings. Sometimes the lineno refers to the start of the docstring and sometimes to the end, so that throws off the computation of accurate source locations for errors.

@analog-cbarber
Copy link
Author

I think the issue with the line numbers for doc strings being off is only when running under python 3.7.

@pawamoy
Copy link
Member

pawamoy commented Jun 27, 2022

Do you happen to test on multiple Python versions? I think it's broken on 3.7.

@analog-cbarber
Copy link
Author

I was just using 3.7 for building my dev environment by default (since that is the min version I support), but when I looked at the Griffe code I saw that there were special cases for 3.7. When I switch to 3.8 there was no strange behavior in the line doc-string numbers.

@pawamoy
Copy link
Member

pawamoy commented Jun 27, 2022

OK, that confirms it, thanks. It seems I couldn't get the tokenize code right :/ I'll try to fix it asap.

Fixes mkdocstrings#27

- Move subsitution to from collect to render
- Warn w/ source location about missing reference after relative path resolution
- Work around issue with bad source information in python 3.7
- Add debug logging
- Fix bug in regular expression
@analog-cbarber analog-cbarber marked this pull request as ready for review July 10, 2022 17:19
@analog-cbarber
Copy link
Author

I added a workaround for the source location problem in Python 3.7, added a check for missing reference after
relative path resolution (with source location), and moved the substitution call into the render method.

@analog-cbarber
Copy link
Author

For use with our projects, I made a package that subclasses the regular python handler to add this feature:

class GarpyPythonHandler(PythonHandler):
    """Extended version of mkdocstrings Python handler
    """

    handler_name: str = __name__.rsplit('.', 2)[1]

    default_config = dict(
        PythonHandler.default_config,
        relative_crossrefs = False,
    )

    def __init__(self,
                 theme: str,
                 custom_templates: Optional[str] = None,
                 config_file_path: Optional[str] = None,
                 paths: Optional[List[str]] = None,
                 **config: Any,
                 ):
        super().__init__(
            handler = self.handler_name,
            theme = theme,
            custom_templates = custom_templates,
            config_file_path = config_file_path,
            paths = paths,
            **config
        )

    def render(self, data: Object, config: dict) -> str:
        final_config = ChainMap(config, self.default_config)

        if final_config["relative_crossrefs"]:
            substitute_relative_crossrefs(data, checkref=self._check_ref)

        return super().render(data, config)

    def get_templates_dir(self, handler: str) -> Path:
        """See [render][.barf]"""
        if handler == self.handler_name:
            handler = 'python'
        return super().get_templates_dir(handler)

    def _check_ref(self, ref:str) -> bool:
        """Check for existence of reference"""
        try:
            self.collect(ref, {})
            return True
        except Exception:  # pylint: disable=broad-except
            # Only expect a CollectionError but we may as well catch everything.
            return False

It would be nice if this merge request were accepted in some form, but making our own handler supports our own use cases.

@pawamoy
Copy link
Member

pawamoy commented Aug 17, 2022

Hello, sorry for the delay, I was dealing with burnout. I'll need to get back into coding/fixing/implementing/releasing stuff before being able to review this. I hope you understand. I'll keep this in my saved notifications anyway.

@analog-cbarber
Copy link
Author

No problem. It goes without saying that life and and day jobs come first!

@pawamoy
Copy link
Member

pawamoy commented Dec 15, 2022

Hey @analog-cbarber, thank you for updating the PR. I did not forget about you. Just busy 😅
Also, I'm still not convinced by this approach. I'd prefer to let it live as an external handler for now, as I believe you've been using it? Is this OK with you you? We don't have to close this PR (unless you want to), let say it's just on hold until I have the time and focus to make up my mind.

@analog-cbarber
Copy link
Author

Even if there is a way to implement this in a more general fashion, I think that would be much more work than I had to spend implementing this, so it might be quite a while before anyone gets around to implementing that.

For our own use, subclassing this handler in our own package works just fine, but the only ones who benefit from that our projects in my company. It isn't worth the trouble for me to publish this extension externally as a separate package on pypi and conda-forge. It seems a shame that no one else can use this feature.

And for us, the more we use this relative cross-reference syntax in our own doc-strings, the more we become dependent on this implementation, so it would be nice to at least get some agreement on whether my syntax is acceptable.

BTW, I will soon push an update to the PR for a (p) syntax for referring to the enclosing package.

@pawamoy
Copy link
Member

pawamoy commented Dec 16, 2022

It isn't worth the trouble for me to publish this extension externally as a separate package on pypi and conda-forge. It seems a shame that no one else can use this feature.

The same way it isn't worth the trouble to you, it isn't worth the trouble to me to take on the maintenance task, especially if I'm not satisfied with the approach, because it would change again later and would impact many more users.

And for us, the more we use this relative cross-reference syntax in our own doc-strings, the more we become dependent on this implementation, so it would be nice to at least get some agreement on whether my syntax is acceptable.

OK, I took some time to think about it. I think it's too complex. Lots of the examples used in the tests can be written in many different ways.

# all equivalent
assert_sub(meth1, "foo", "(c).baz.", "mod1.mod2.Class1.baz.foo")
assert_sub(meth1, "foo", ".baz.", "mod1.mod2.Class1.baz.foo")
# all equivalent
assert_sub(meth1, "foo", ".", "mod1.mod2.Class1.foo")
assert_sub(meth1, "foo", "^.", "mod1.mod2.Class1.foo")
assert_sub(meth1, "foo", "(c).", "mod1.mod2.Class1.foo")
# all equivalent
assert_sub(meth1, "foo", "(m).", "mod1.mod2.foo")
assert_sub(meth1, "foo", "^^.", "mod1.mod2.foo")
# all equivalent
assert_sub(meth1, "foo", "^", "mod1.mod2.Class1")
assert_sub(meth1, "foo", "(c)", "mod1.mod2.Class1")

The syntax seems powerful, but it's also confusing, asking users some mental gymnastics to climb up and down the objects hierarchy, and to learn the meaning of each symbol. It also requires rather complex code to handle it.
If we try and follow the zen of Python, there should be one and only one obvious way to do it.

With that in mind, I ran some tests and came up with a variant of the syntax suggested in mkdocstrings/mkdocstrings#489:

  • . denotes the current object
  • .. denotes the parent of the current object
  • etc.

The different things I tested are shown below, with . denoting the module, parent or current object respectively.

Module

# module_a
class ClassA:
    """
    Reference to [module_a][.]
    Reference to [ClassB][..module_b.ClassB]
    Reference to [method_a][.ClassA.method_a]
    """
    def method_a():
        """Reference to [ClassA][.ClassA]"""

Parent

# module_a
class ClassA:
    """
    Reference to [module_a][.]
    Reference to [ClassB][..module_b.ClassB]
    Reference to [method_a][.ClassA.method_a]
    """
    def method_a():
        """Reference to [ClassA][.]"""

Current object

# module_a
class ClassA:
    """
    Reference to [module_a][..]
    Reference to [ClassB][...module_b.ClassB]
    Reference to [method_a][.method_a]
    """
    def method_a():
        """Reference to [ClassA][..]"""

Using the current object for . seems to yield the most succinct references, so I'd probably go with this syntax if I was to implement support for relative references. I'd still probably not implement it in handlers themselves though. WDYT?

@analog-cbarber
Copy link
Author

Yes, you obviously have to be happy with the behavior.

I actually used .. syntax for parent in my first attempt at implementing this. It does mimic the syntax used for relative imports in Python. I thought that the ^ might be slightly clearer, but using multiple .. is fine. In practice, you will
find that you will end up using [..] much more than [.], which is why I thought it might be nice to have a single
symbol for referring to the parent.

But in using the syntax in our own code I found that I was always doing mental gymnastics to count where I was in the code.
It is easy to understand the parent dereferencing syntax, but when you read an expression like [^^^foo.] (or [....foo]) it does make the reader have to do some work to figure out what that means in that context. That is why I added the
(c) and (m) syntaxes. I found that much easier to read. In practice, after using it for a while in our code, I would probably avoid using more than one level of parent and just resort to explicitly writing out the prefix.

I allowed . to refer to the parent for methods and functions which are not themselves namespaces. The thought
was that it would otherwise not be correct and was probably what the user intended. But supporting that syntax does
tend to get you in the habit of writing [.] to refer to siblings in your class only to have this break when you accidentally
use this syntax in the doc string for something other than a method. So I would not mind dropping that behavior and
always require you to use either [(c).] or [..].

As to the zen of python, if there is to be only one way to reference a symbol we would not even consider this feature because the one obvious way is to write out the entire fully qualified reference every time. In practice, it is rarely true
that there is only one way to do something in Python, and I think that is mostly a good thing. I think it is good to
give users a couple of different ways that they can write their references as long as there is some logic behind it.

As to the implementation, this was the easiest way to do this especially given the nice griffe datastructures.
Implementing in a more general way would require touching multiple repos in concert. That may be something you are comfortable doing but it seemed like too much work for me to take on.

Once nice side-effect of this implementation is the ability to report exact line numbers for errors. One big issue with mkdocstrings is that when you have a large project with some broken references you have to do text searches to
figure out where the errors come from rather than just clicking on a path. Sometimes it has taken me 30 minutes
to fix issues that should have taken 5 minutes to fix if I knew the correct source locations. When the bad references are constructed using the proposed relative syntax it would be even harder to trace those back to the source if you do
not report it until later in the pipeline after the source location has been lost.

So if I replace ^ with .. etc. and drop the (c) etc. syntax? Would that make the PR acceptable?

@pawamoy
Copy link
Member

pawamoy commented Dec 20, 2022

You clearly have put a lot of time and thoughts into this, I can appreciate that.

Implementing in a more general way would require touching multiple repos in concert. That may be something you are comfortable doing but it seemed like too much work for me to take on.

I understand. It requires a bit more efforts and synchronization to handle this at the level of autorefs.

Once nice side-effect of this implementation is the ability to report exact line numbers for errors.

This is indeed something I did not think about initially. I think it's possible to propagate the location information to autorefs.


With all that said, I still want to try and implement that in autorefs + handlers templates.
I know it has been a long time already, so I will ask you to be a bit more patient 😕
Maybe you're already aware (not that you should), but I've been working a lot on Griffe lately, as it's the foundation of the Python handler. The work on mkdocstrings has therefore been delayed by quite some time. There's still work to do on Griffe (handling everybody's projects without failures is really not trivial!), but at least Griffe is more robust now, so hopefully I'll have more time to spend on mkdocstrings.

In the meantime, I would gladly advertise your custom handler in mkdocstrings docs if you ever want to make it public.
I think it's nice when projects offer built-in features but still allow alternatives to be used. This could also potentially help us getting more feedback from users on the syntax, features and implementation.

I'll close this now, I hope you understand. mkdocstrings is gaining some popularity and it will only get more and more difficult to maintain it and I want to avoid merging things if I'm not 100% convinced.

@pawamoy pawamoy closed this Dec 20, 2022
@analog-cbarber
Copy link
Author

Ok. I will look into publishing my handler package somewhere. Because it is corporate code, I have some extra hurdles to navigate. I will probably just publish it on our anaconda account rather than pypi/conda-forge and won't promise to support it forever.

My implementation is a little fragile since it directly subclasses the official handler and relies on internal implementation details.

In the meantime, those who are interested could try building from the branch on my fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relative cross-references
2 participants