Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a doc anchor is not encoded correctly #867

Merged
merged 9 commits into from May 10, 2024

Conversation

hironytic
Copy link
Contributor

Bug/issue #458

Summary

There were two problems related with this issue.

  1. anchor in heading is not encoded.
  2. url in references is encoded twice.

Issue Details

When using Japanese heading "テスト", which consists of non-ascii characters of E3 83 86 E3 82 B9 E3 83 88 in UTF-8, like this:

# Access Control

This is the abstract of my article.

- <doc:AccessControl#テスト>

## テスト

This is a test text.

a generated result JSON was like belows:

{
  // …
  "primaryContentSections": [
    {
      "kind": "content",
      "content": [
        // …
        {
          {
            "type": "heading",
            "text": "テスト",
            "level": 2,
            "anchor": "テスト" // anchor is not encoded
        },
        // …
      ],
    }
  ],
  // …
  "references": {
    "doc://org.swift.docc.example/documentation/TechnologyX/AccessControl#%E3%83%86%E3%82%B9%E3%83%88": {
      "kind": "section",
      "abstract": [ ],
      "url": "/documentation/technologyx/accesscontrol#%25E3%2583%2586%25E3%2582%25B9%25E3%2583%2588", // url is encoded twice
      "type": "topic",
      "identifier": "doc://org.swift.docc.example/documentation/TechnologyX/AccessControl#%E3%83%86%E3%82%B9%E3%83%88",
      "title": "テスト"
    },
    // …
  }
}

Fixes

1. anchor in heading is not encoded.

When creating RenderBlockContent.heading, fix that encoded anchor should be passed to anchor.

This solution is the suggested one in the forum:
https://forums.swift.org/t/can-docc-auto-generate-anchors-be-used-in-non-ascii-strings/62295/8

2. url in references is encoded twice.

In NodeURLGenerator.urlForReference(_:fileSafePath:), url.fragment, which was already encoded by URL, was passed to withFragment(_:), and there it was passed to URLComponents and encoded again.

Fixed to pass the fragment that removed encoding to withFragment(_:).

Dependencies

None

Testing

Added two test cases to check the above two fixes.
Each test case is failed before the fixes, and passed after the fixes.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • [ ] Updated documentation if necessary

@d-ronnqvist
Copy link
Contributor

Apologies for the delay. I plan to look at this next week.

@d-ronnqvist
Copy link
Contributor

Sorry. I was out sick for a week and needed another week to catch up on everything. I'll review this right away.

Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

The only change that I'd ask for is this change is tested. (I left two comments on alternative ways to implement the same thing but those are very minor).

Because this issue is about how anchor of the header is encoded and how a link to that header is encoded in the references section I think it would be good to test a small article with a heading and a link to that heading and verify that the rendered link and heading both encode the anchor correctly.

A test like this, which build a page and "renders" it, invovle a few more pieces and can be hard to write the first time so I'll point out the central pieces below.

Taking inspiration from a few other tests, it's possible to define a small catalog inline in a test like this:

let catalogURL = try createTempFolder(content: [
    Folder(name: "unit-test.docc", content: [
        TextFile(name: "Root.md", utf8Content: """
        # My root page
        
        This single article defines two headings and links to them
        
        ### テスト
        - <doc:#テスト>
        
        ### Some heading
        - <doc:#Some-heading>
        """),
    ])
])
let (_, bundle, context) = try loadBundle(from: catalogURL)

Since this catalog only has one page, you can get the only (root) page from the context using context.soleRootModuleReference and then get the model object for that page and render it using something like this:

let reference = try XCTUnwrap(context.soleRootModuleReference)
let node = try context.entity(with: reference)
let renderContext = RenderContext(documentationContext: context, bundle: bundle)
let converter = DocumentationContextConverter(bundle: bundle, context: context, renderContext: bundle)
let renderNode = try XCTUnwrap(converter.renderNode(for: node, at: nil))

Now renderNode holds the full representation of that rendered page, which contains both the headings and the information about the links.

You can extract only the headings from the page's content using something like:

let contentSection = try XCTUnwrap(renderNode.primaryContentSections.first as? ContentRenderSection)
let headings: [RenderBlockContent.Heading] = contentSection.content.compactMap {
    if case .heading(let heading) = $0 {
        return heading
    } else {
        return nil
    }
}

Information about the links (and other things that the page references) can be found among the render node's "references". For example, you find information about the referenced "テスト" heading like this:

let testTopic = try XCTUnwrap(renderNode.references["doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88"] as? TopicRenderReference)

@@ -171,7 +171,7 @@ public struct NodeURLGenerator {
)
} else {
let url = baseURL.appendingPathComponent(safePath, isDirectory: false)
return url.withFragment(reference.url.fragment)
return url.withFragment(reference.url.fragment?.removingPercentEncoding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference stores the unencoded fragment. We can pass that here.

Suggested change
return url.withFragment(reference.url.fragment?.removingPercentEncoding)
return url.withFragment(reference.fragment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't notice that it has the unencoded fragment. I've fixed it.

Comment on lines 49 to 53
let fragment = urlReadableFragment(heading.plainText)
var components = URLComponents()
components.fragment = fragment
let anchor = components.url?.fragment ?? fragment
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: anchor))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shorter way of encoding the fragment, without needing to handle the optional url, would be to call .addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed) on the fragment

Suggested change
let fragment = urlReadableFragment(heading.plainText)
var components = URLComponents()
components.fragment = fragment
let anchor = components.url?.fragment ?? fragment
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: anchor))]
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: urlReadableFragment(heading.plainText).addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed)))]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. But the reference url is encoded by using URLComponents here.

var components = URLComponents()
components.scheme = ResolvedTopicReference.urlScheme
components.host = bundleIdentifier
components.path = path
components.fragment = fragment
self.url = components.url!

Is there no difference between URLComponents and addingPercentEncoding(withAllowedCharacters:.urlFragmentAllowed)? If there are any character that will be encoded on the one hand and not be on the other hand, the link will be broken. I'm afraid of that. How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm aware they do the same thing and the documentation for URLComponents/percentEncodedFragment recommends that the caller uses .addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed) for compatibility if they want to assign an already encoded fragment:

[...] Although ‘;’ is a legal path character, it is recommended that it be percent-encoded for best compatibility with URL (String.addingPercentEncoding(withAllowedCharacters:) will percent-encode any ‘;’ characters if you pass CharacterSet.urlFragmentAllowed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll modify the code as per your suggestion.

@hironytic
Copy link
Contributor Author

Oh are you feeling better now? I also couldn't make time on April. I was working so hard because there was a big milestone in a company project. So don't mind.

Then thanks for your reviewing.
I added a new test case following your advice and place it in a new file. But I'm afraid that there is more suitable locatiion to place that file or should I write the test case in some other existing file? If so feel free to let me know.

@@ -77,6 +77,19 @@ class NodeURLGeneratorTests: XCTestCase {
XCTAssertEqual(generator.urlForReference(classIdentifier).absoluteString, "file:///path/to/bundle/_testbundle-ctlj/products/documentation.builtbundle/com.example.testbundle/data/folder/_privateclass/_privatesubclass")
}

func testsafeURLWithEncodableFragment() throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this test because we can't verify that it's created the resolved topic reference in the same way that the context is. This means that this test could pass while the end-to-end experience is still broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've removed it.

let source = """
## テスト

Note that テスト consists of Non-ASCII characters of E3 83 86 E3 82 B9 E3 83 88 in UTF-8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor: Rather than this content in the markup, this information could be more useful near the heading.anchor assertion.

XCTAssertEqual(heading.anchor, "%E3%83%86%E3%82%B9%E3%83%88", "The UTF-8 representation of テスト is E3 83 86 E3 82 B9 E3 83 88")

If could also help explain the test assertion if there was a second assertion that verified that it matched the percent escaped value:

XCTAssertEqual(heading.anchor, "テスト".addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed))

That way someone who reads this test can understand that %E3%83%86%E3%82%B9%E3%83%88 is the percent encoded fragment.

Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

I would just remove testsafeURLWithEncodableFragment()—since the other test verifies how fragments are encoded for references to headings created by the documentation context—and then this would be ready to merge.

Also apply suggestions from code review.
/*
This source file is part of the Swift.org open source project

Copyright (c) 2021-2024 Apple Inc. and the Swift project authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is a new file

Suggested change
Copyright (c) 2021-2024 Apple Inc. and the Swift project authors
Copyright (c) 2024 Apple Inc. and the Swift project authors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👌🏻

import Foundation
import XCTest
@testable import SwiftDocC
@_spi(FileManagerProtocol) import SwiftDocCTestUtilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This SPI import isn't needed anymore after we moved the type to use package access.

Suggested change
@_spi(FileManagerProtocol) import SwiftDocCTestUtilities
import SwiftDocCTestUtilities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@d-ronnqvist
Copy link
Contributor

@swift-ci please test

XCTAssertEqual(headings[1].anchor, "Some-heading")

// Check links to them
let testTopic0 = try XCTUnwrap(renderNode.references["doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88"] as? TopicRenderReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line fails in the CI

Tests/SwiftDocCTests/Rendering/HeadingAnchorTests.swift:54: 
error: HeadingAnchorTests.testEncodeHeadingAnchor : 
XCTUnwrap failed: expected non-nil value of type "TopicRenderReference"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I have no idea. The test passes on my local Mac environment. But the test in the CI fails on both Mac and Linux.
The failed line is based on that you told me here. Can the key of renderNode.references have different values depending on the environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reproduced this on linux docker image.
Currently all what I know is that the link which has doc:#テスト in its destination, is dropped by this guard. But it is not dropped on my local macOS.

// Before attempting to resolve the link, we confirm that is has a ResolvedTopicReference urlScheme
guard ResolvedTopicReference.urlHasResolvedTopicScheme(URL(string: destination)) else {

Because URL(string: "doc:#テスト")) returns nil on linux docker image and returns an object on local macOS.
But the result of the CI suggests it can also happen on macOS under certain conditions.

How do you think we should do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix that.
To my understanding, we should use ValidatedURL(parsingAuthoredLink:) rather than URL(string:) to get a url of the link destination because it is author provided documentation one.
ValidatedURL(parsingAuthoredLink:) is that already used by resolveTopicReference(_:) which is called in visitLink(_:).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. I would have expected that by the time the link is rendered it would have already been "resolved" which should replace the authored destination with the url value from the ResolvedTopicReference. 🤔

In other words; I would expect the destination to have be "doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88" by this point instead of "doc:#テスト".

I wonder if there's a code path that doesn't do that. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be resolved here in same visitLink(_:)

guard let destination = link.destination, let resolved = resolveTopicReference(destination) else {

However before reaching that, the destination is checked here (28 lines above that point) and URL(string:) fails.

// Before attempting to resolve the link, we confirm that is has a ResolvedTopicReference urlScheme
guard ResolvedTopicReference.urlHasResolvedTopicScheme(URL(string: destination)) else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing that investigative work.

It looks like you've found another, unrelated, bug where links in synthesized root pages don't resolve their links in the intended manner. Unless you really want to, I don't think it's necessary to fix both bugs in the same PR.

This other bug is rather specific to links in synthesized root pages so it has a smaller impact and could be addressed separately. Anchors would still be encoded correctly—with your changes—in all other sources of documentation.

To get the test to pass you can explicitly mark it as a technology root (instead of relying on DocC to infer it as a root). To do this you'd add a @TechnologyRoot directive in the markup:

  TextFile(name: "Root.md", utf8Content: """
  # My root page

  This single article defines two headings and links to them

+ @Metadata {
+   @TechnologyRoot
+ }

  ### テスト
  - <doc:#テスト>

  ### Some heading
  - <doc:#Some-heading>
  """),

If you do this you can revert the change to treat link destinations as authored during rendering. Links are intended to always be resolved before rendering happens and I would prefer to not add any more code that relies on resolving links during rendering.

If you're curious I can go into more detail on Friday or next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it! I did as you told me. 😃

@d-ronnqvist d-ronnqvist self-requested a review May 8, 2024 15:23
@d-ronnqvist
Copy link
Contributor

@swift-ci please test

@d-ronnqvist
Copy link
Contributor

@swift-ci please test

Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for fixing this.

This looks good to me.

I'll merge this PR and then open a cherry-pick PR for the 6.0 release.

@d-ronnqvist d-ronnqvist merged commit f390891 into apple:main May 10, 2024
2 checks passed
d-ronnqvist pushed a commit to d-ronnqvist/swift-docc that referenced this pull request May 10, 2024
* Fix a doc anchor is not encoded correctly

* Add a test for rendering heading anchors

* Pass stored unencoded fragment

* Remove the unsuitable test case

Also apply suggestions from code review.

* Apply suggestions from code review

* Treat a link destination as the "authored link"

* Revert "Treat a link destination as the "authored link""

This reverts commit 4e3f8da.

* Mark the test article as a technology root
@hironytic
Copy link
Contributor Author

Thank you for your help. Encoding problem is quite a hassle, but I hope you don't dislike Japanese characters.😉

@hironytic hironytic deleted the fix-doc-anchor branch May 10, 2024 10:18
d-ronnqvist added a commit that referenced this pull request May 10, 2024
* Fix a doc anchor is not encoded correctly

* Add a test for rendering heading anchors

* Pass stored unencoded fragment

* Remove the unsuitable test case

Also apply suggestions from code review.

* Apply suggestions from code review

* Treat a link destination as the "authored link"

* Revert "Treat a link destination as the "authored link""

This reverts commit 4e3f8da.

* Mark the test article as a technology root

Co-authored-by: Hironori Ichimiya <[email protected]>

rdar://127870203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants