Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relaxed_autolinks option causes markdown links with url in title to get parsed incorrectly. #459

Closed
digitalmoksha opened this issue Aug 22, 2024 · 7 comments
Assignees

Comments

@digitalmoksha
Copy link
Collaborator

digitalmoksha commented Aug 22, 2024

When using the auotlinker extension and the relaxed_autolinks option, markdown links such as [https://github.com](https://github.com) get parsed incorrectly, generating

<p>[<a href="https://github.com/%5D(https://github.com/)">https://github.com/](https://github.com/)</a></p>

instead of

<p><a href="https://github.com/">https://github.com/</a></p>

See https://gitlab-org.gitlab.io/ruby/gems/gitlab-glfm-markdown/?text=%5Bhttps%3A%2F%2Fgithub.com%2F%5D(https%3A%2F%2Fgithub.com%2F)

@digitalmoksha digitalmoksha self-assigned this Aug 22, 2024
@kivikakk
Copy link
Owner

Bisect shows this was introduced at bc754e6; i.e. upon my attempt at restoring relaxed_autolinks behaviour after the inline rewrite to fix up all those other autolink edgecases.

There should be a failing unit test for this to prevent a regression; exactly how to restore the behaviour depends on exactly what the desired behaviour is. (i.e. when is an autolink inside brackets good for "relaxed autolinking", vs. not?)

@digitalmoksha
Copy link
Collaborator Author

ah, thanks for bisecting that!

@digitalmoksha
Copy link
Collaborator Author

I've gone back and forth on this, and I don't see a good way to solve this for the general case.

It's easy enough to fix [http://foo.com](http://foo.com) and make sure it generates <a href="http://foo.com">http://foo.com</a>.

But the more general case, [this http://and.com that and http://the.com other](http://foo.com) is more difficult. As it stands it generates

<a href="http://foo.com">this <a href="http://and.com">http://and.com</a> that and <a href="http://the.com">http://the.com</a> other</a>

rather than

<a href="http://foo.com">this http://and.com that and http://the.com other</a>

It's problematic to have links embedded within links.

The original problem we were trying to solve was being able to support matching bracketed text and auto links so that the ending bracket is not pulled in as part of the url. For example

{this http://foo.com}

Without --relaxed-autolinks we get

<p>{this <a href="http://foo.com%7D">http://foo.com}</a></p>

and with it we get

<p>{this <a href="http://foo.com">http://foo.com</a>}</p>

This is the same type of support for ( - (this http://foo.com) generates what you would expect, an auto linked foo.com.

However since [ is an integral part of the markdown link syntax, we would have to do extensive scanning to determine if we're in a properly qualified markdown link. If we are, we want to ignore auto links, and if we're not, we want to autolink.

Here are possible options I see:

  1. Try to properly determine if we're inside a valid link. I think this would require duplicating/extracting parts of the handle_close_bracket function. There is probably a more elegant way to do it, I'm just not seeing it at the moment.

  2. Remove embedded links in links during the HTML rendering process. This would be similar to how we handle embedded strong in a strong - we don't render the interior tag. This would probably break the use of the
    [this <http://and.com> that](http://foo.com) syntax, which does generate a link in a link. But I don't see a value in being able to do that.

  3. Remove support for using bracketed [. Easiest solution. It does remove a syntax that has been supported in GitLab for awhile. It's unclear how much it's used in the wild. Brackets are useful for delineation of information, so I can envision situations where you might have a list like

    - [link1] - something
    - [link2] - next something
    - ...
    
  4. Handle autolinking post-process, which would mean iterating over text nodes and trying to autolink. This would be a big change from what we have now (though maybe comrak did that at one time?), and might break the gfm compatibility.

Unless there is a way to make 1. work, I'm leaning towards 2., then 3. if it doesn't work. 4. is a non-starter.

@digitalmoksha
Copy link
Collaborator Author

Handle autolinking post-process, which would mean iterating over text nodes and trying to autolink. This would be a big change from what we have now (though maybe comrak did that at one time?), and might break the gfm compatibility.

Well, we do handle some autolink processing in postprocess_text_node. So now I'm confused. 😕

@digitalmoksha
Copy link
Collaborator Author

postprocess_text_node now only handles email auto-linking - url/www matching is handled inline.

Created #461

digitalmoksha added a commit that referenced this issue Aug 26, 2024
when using the `relaxed-autolinks` option.

See #459
digitalmoksha added a commit that referenced this issue Aug 26, 2024
when using the `relaxed-autolinks` option.

See #459
@kivikakk
Copy link
Owner

Yeah, 4. is a non-starter because the autolink fixes in #426 meant exactly doing the opposite. Comrak did used to do it that way, because it was easier to implement initially, but produced ~inf discrepancies and edge-cases. It's now exactly inline with upstream; url/www inline, emails post-processed.

digitalmoksha added a commit that referenced this issue Aug 30, 2024
when using the `relaxed-autolinks` option.

See #459
@digitalmoksha
Copy link
Collaborator Author

#461 has been merged, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants