New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a doc anchor is not encoded correctly #867
Conversation
Apologies for the delay. I plan to look at this next week. |
Sorry. I was out sick for a week and needed another week to catch up on everything. I'll review this right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
The only change that I'd ask for is this change is tested. (I left two comments on alternative ways to implement the same thing but those are very minor).
Because this issue is about how anchor of the header is encoded and how a link to that header is encoded in the references section I think it would be good to test a small article with a heading and a link to that heading and verify that the rendered link and heading both encode the anchor correctly.
A test like this, which build a page and "renders" it, invovle a few more pieces and can be hard to write the first time so I'll point out the central pieces below.
Taking inspiration from a few other tests, it's possible to define a small catalog inline in a test like this:
let catalogURL = try createTempFolder(content: [
Folder(name: "unit-test.docc", content: [
TextFile(name: "Root.md", utf8Content: """
# My root page
This single article defines two headings and links to them
### テスト
- <doc:#テスト>
### Some heading
- <doc:#Some-heading>
"""),
])
])
let (_, bundle, context) = try loadBundle(from: catalogURL)
Since this catalog only has one page, you can get the only (root) page from the context using context.soleRootModuleReference
and then get the model object for that page and render it using something like this:
let reference = try XCTUnwrap(context.soleRootModuleReference)
let node = try context.entity(with: reference)
let renderContext = RenderContext(documentationContext: context, bundle: bundle)
let converter = DocumentationContextConverter(bundle: bundle, context: context, renderContext: bundle)
let renderNode = try XCTUnwrap(converter.renderNode(for: node, at: nil))
Now renderNode
holds the full representation of that rendered page, which contains both the headings and the information about the links.
You can extract only the headings from the page's content using something like:
let contentSection = try XCTUnwrap(renderNode.primaryContentSections.first as? ContentRenderSection)
let headings: [RenderBlockContent.Heading] = contentSection.content.compactMap {
if case .heading(let heading) = $0 {
return heading
} else {
return nil
}
}
Information about the links (and other things that the page references) can be found among the render node's "references". For example, you find information about the referenced "テスト" heading like this:
let testTopic = try XCTUnwrap(renderNode.references["doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88"] as? TopicRenderReference)
@@ -171,7 +171,7 @@ public struct NodeURLGenerator { | |||
) | |||
} else { | |||
let url = baseURL.appendingPathComponent(safePath, isDirectory: false) | |||
return url.withFragment(reference.url.fragment) | |||
return url.withFragment(reference.url.fragment?.removingPercentEncoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference stores the unencoded fragment. We can pass that here.
return url.withFragment(reference.url.fragment?.removingPercentEncoding) | |
return url.withFragment(reference.fragment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't notice that it has the unencoded fragment. I've fixed it.
let fragment = urlReadableFragment(heading.plainText) | ||
var components = URLComponents() | ||
components.fragment = fragment | ||
let anchor = components.url?.fragment ?? fragment | ||
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: anchor))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shorter way of encoding the fragment, without needing to handle the optional url
, would be to call .addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed)
on the fragment
let fragment = urlReadableFragment(heading.plainText) | |
var components = URLComponents() | |
components.fragment = fragment | |
let anchor = components.url?.fragment ?? fragment | |
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: anchor))] | |
return [RenderBlockContent.heading(.init(level: heading.level, text: heading.plainText, anchor: urlReadableFragment(heading.plainText).addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed)))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. But the reference url is encoded by using URLComponents
here.
swift-docc/Sources/SwiftDocC/Model/Identifier.swift
Lines 474 to 479 in 135a55e
var components = URLComponents() | |
components.scheme = ResolvedTopicReference.urlScheme | |
components.host = bundleIdentifier | |
components.path = path | |
components.fragment = fragment | |
self.url = components.url! |
Is there no difference between URLComponents
and addingPercentEncoding(withAllowedCharacters:.urlFragmentAllowed)
? If there are any character that will be encoded on the one hand and not be on the other hand, the link will be broken. I'm afraid of that. How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware they do the same thing and the documentation for URLComponents/percentEncodedFragment recommends that the caller uses .addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed)
for compatibility if they want to assign an already encoded fragment:
[...] Although ‘;’ is a legal path character, it is recommended that it be percent-encoded for best compatibility with
URL
(String.addingPercentEncoding(withAllowedCharacters:)
will percent-encode any ‘;’ characters if you passCharacterSet.urlFragmentAllowed
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll modify the code as per your suggestion.
Oh are you feeling better now? I also couldn't make time on April. I was working so hard because there was a big milestone in a company project. So don't mind. Then thanks for your reviewing. |
@@ -77,6 +77,19 @@ class NodeURLGeneratorTests: XCTestCase { | |||
XCTAssertEqual(generator.urlForReference(classIdentifier).absoluteString, "file:///path/to/bundle/_testbundle-ctlj/products/documentation.builtbundle/com.example.testbundle/data/folder/_privateclass/_privatesubclass") | |||
} | |||
|
|||
func testsafeURLWithEncodableFragment() throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this test because we can't verify that it's created the resolved topic reference in the same way that the context is. This means that this test could pass while the end-to-end experience is still broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've removed it.
let source = """ | ||
## テスト | ||
|
||
Note that テスト consists of Non-ASCII characters of E3 83 86 E3 82 B9 E3 83 88 in UTF-8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor: Rather than this content in the markup, this information could be more useful near the heading.anchor
assertion.
XCTAssertEqual(heading.anchor, "%E3%83%86%E3%82%B9%E3%83%88", "The UTF-8 representation of テスト is E3 83 86 E3 82 B9 E3 83 88")
If could also help explain the test assertion if there was a second assertion that verified that it matched the percent escaped value:
XCTAssertEqual(heading.anchor, "テスト".addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed))
That way someone who reads this test can understand that %E3%83%86%E3%82%B9%E3%83%88
is the percent encoded fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
I would just remove testsafeURLWithEncodableFragment()
—since the other test verifies how fragments are encoded for references to headings created by the documentation context—and then this would be ready to merge.
Also apply suggestions from code review.
/* | ||
This source file is part of the Swift.org open source project | ||
|
||
Copyright (c) 2021-2024 Apple Inc. and the Swift project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is a new file
Copyright (c) 2021-2024 Apple Inc. and the Swift project authors | |
Copyright (c) 2024 Apple Inc. and the Swift project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 👌🏻
import Foundation | ||
import XCTest | ||
@testable import SwiftDocC | ||
@_spi(FileManagerProtocol) import SwiftDocCTestUtilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This SPI import isn't needed anymore after we moved the type to use package
access.
@_spi(FileManagerProtocol) import SwiftDocCTestUtilities | |
import SwiftDocCTestUtilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@swift-ci please test |
XCTAssertEqual(headings[1].anchor, "Some-heading") | ||
|
||
// Check links to them | ||
let testTopic0 = try XCTUnwrap(renderNode.references["doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88"] as? TopicRenderReference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line fails in the CI
Tests/SwiftDocCTests/Rendering/HeadingAnchorTests.swift:54:
error: HeadingAnchorTests.testEncodeHeadingAnchor :
XCTUnwrap failed: expected non-nil value of type "TopicRenderReference"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I have no idea. The test passes on my local Mac environment. But the test in the CI fails on both Mac and Linux.
The failed line is based on that you told me here. Can the key of renderNode.references
have different values depending on the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reproduced this on linux docker image.
Currently all what I know is that the link which has doc:#テスト
in its destination, is dropped by this guard. But it is not dropped on my local macOS.
swift-docc/Sources/SwiftDocC/Model/Rendering/RenderContentCompiler.swift
Lines 133 to 134 in a8a36fc
// Before attempting to resolve the link, we confirm that is has a ResolvedTopicReference urlScheme | |
guard ResolvedTopicReference.urlHasResolvedTopicScheme(URL(string: destination)) else { |
Because URL(string: "doc:#テスト"))
returns nil on linux docker image and returns an object on local macOS.
But the result of the CI suggests it can also happen on macOS under certain conditions.
How do you think we should do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to fix that.
To my understanding, we should use ValidatedURL(parsingAuthoredLink:)
rather than URL(string:)
to get a url of the link destination because it is author provided documentation one.
ValidatedURL(parsingAuthoredLink:)
is that already used by resolveTopicReference(_:)
which is called in visitLink(_:)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I would have expected that by the time the link is rendered it would have already been "resolved" which should replace the authored destination with the url
value from the ResolvedTopicReference
. 🤔
In other words; I would expect the destination to have be "doc://unit-test/documentation/Root#%E3%83%86%E3%82%B9%E3%83%88"
by this point instead of "doc:#テスト"
.
I wonder if there's a code path that doesn't do that. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be resolved here in same visitLink(_:)
guard let destination = link.destination, let resolved = resolveTopicReference(destination) else { |
However before reaching that, the destination is checked here (28 lines above that point) and URL(string:)
fails.
swift-docc/Sources/SwiftDocC/Model/Rendering/RenderContentCompiler.swift
Lines 133 to 134 in a8a36fc
// Before attempting to resolve the link, we confirm that is has a ResolvedTopicReference urlScheme | |
guard ResolvedTopicReference.urlHasResolvedTopicScheme(URL(string: destination)) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing that investigative work.
It looks like you've found another, unrelated, bug where links in synthesized root pages don't resolve their links in the intended manner. Unless you really want to, I don't think it's necessary to fix both bugs in the same PR.
This other bug is rather specific to links in synthesized root pages so it has a smaller impact and could be addressed separately. Anchors would still be encoded correctly—with your changes—in all other sources of documentation.
To get the test to pass you can explicitly mark it as a technology root (instead of relying on DocC to infer it as a root). To do this you'd add a @TechnologyRoot
directive in the markup:
TextFile(name: "Root.md", utf8Content: """
# My root page
This single article defines two headings and links to them
+ @Metadata {
+ @TechnologyRoot
+ }
### テスト
- <doc:#テスト>
### Some heading
- <doc:#Some-heading>
"""),
If you do this you can revert the change to treat link destinations as authored during rendering. Links are intended to always be resolved before rendering happens and I would prefer to not add any more code that relies on resolving links during rendering.
If you're curious I can go into more detail on Friday or next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it! I did as you told me. 😃
@swift-ci please test |
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for fixing this.
This looks good to me.
I'll merge this PR and then open a cherry-pick PR for the 6.0 release.
* Fix a doc anchor is not encoded correctly * Add a test for rendering heading anchors * Pass stored unencoded fragment * Remove the unsuitable test case Also apply suggestions from code review. * Apply suggestions from code review * Treat a link destination as the "authored link" * Revert "Treat a link destination as the "authored link"" This reverts commit 4e3f8da. * Mark the test article as a technology root
Thank you for your help. Encoding problem is quite a hassle, but I hope you don't dislike Japanese characters.😉 |
* Fix a doc anchor is not encoded correctly * Add a test for rendering heading anchors * Pass stored unencoded fragment * Remove the unsuitable test case Also apply suggestions from code review. * Apply suggestions from code review * Treat a link destination as the "authored link" * Revert "Treat a link destination as the "authored link"" This reverts commit 4e3f8da. * Mark the test article as a technology root Co-authored-by: Hironori Ichimiya <[email protected]> rdar://127870203
Bug/issue #458
Summary
There were two problems related with this issue.
anchor
in heading is not encoded.url
in references is encoded twice.Issue Details
When using Japanese heading "テスト", which consists of non-ascii characters of E3 83 86 E3 82 B9 E3 83 88 in UTF-8, like this:
a generated result JSON was like belows:
Fixes
1.
anchor
in heading is not encoded.When creating
RenderBlockContent.heading
, fix that encoded anchor should be passed toanchor
.This solution is the suggested one in the forum:
https://forums.swift.org/t/can-docc-auto-generate-anchors-be-used-in-non-ascii-strings/62295/8
2.
url
in references is encoded twice.In
NodeURLGenerator.urlForReference(_:fileSafePath:)
,url.fragment
, which was already encoded byURL
, was passed towithFragment(_:)
, and there it was passed toURLComponents
and encoded again.Fixed to pass the fragment that removed encoding to
withFragment(_:)
.Dependencies
None
Testing
Added two test cases to check the above two fixes.
Each test case is failed before the fixes, and passed after the fixes.
Checklist
Make sure you check off the following items. If they cannot be completed, provide a reason.
./bin/test
script and it succeeded[ ] Updated documentation if necessary