-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a few issues with infinite recursions if the content contains cyclic curation #898
Changes from all commits
bd6cfb6
dca31c1
07181eb
4307c4e
be1edf5
1bb242f
598abf3
4572d0f
6bac023
82cc806
7ecdc10
3416c99
4b7d98d
229c2ed
421f9fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
/* | ||
This source file is part of the Swift.org open source project | ||
Copyright (c) 2024 Apple Inc. and the Swift project authors | ||
Licensed under Apache License v2.0 with Runtime Library Exception | ||
See https://swift.org/LICENSE.txt for license information | ||
See https://swift.org/CONTRIBUTORS.txt for Swift project authors | ||
*/ | ||
|
||
extension DocumentationContext { | ||
/// Options that configure how the context produces node breadcrumbs. | ||
struct PathOptions: OptionSet { | ||
let rawValue: Int | ||
|
||
/// Prefer a technology as the canonical path over a shorter path. | ||
static let preferTechnologyRoot = PathOptions(rawValue: 1 << 0) | ||
} | ||
|
||
/// Finds all finite (acyclic) paths, also called "breadcrumbs", to the given reference in the topic graph. | ||
/// | ||
/// Each path is a list of references that describe a walk through the topic graph from a leaf node up to, but not including, the given `reference`. | ||
/// | ||
/// The first path is the canonical path to the node. The other paths are sorted by increasing length (number of components). | ||
/// | ||
/// > Note: | ||
/// If all paths from the given reference are infinite (cycle back on themselves) then this function will return an empty list, because there are no _finite_ paths in the topic graph from that reference. | ||
/// | ||
/// - Parameters: | ||
/// - reference: The reference to find paths to. | ||
/// - options: Options for how the context produces node breadcrumbs. | ||
/// - Returns: A list of finite paths to the given reference in the topic graph. | ||
func finitePaths(to reference: ResolvedTopicReference, options: PathOptions = []) -> [[ResolvedTopicReference]] { | ||
topicGraph.reverseEdgesGraph | ||
.allFinitePaths(from: reference) | ||
// Graph traversal typically happens from the starting point outwards, but the callers of `finitePaths(to:options:)` | ||
// expect paths going inwards from the leaves to the starting point, excluding the starting point itself. | ||
// To match the caller's expectations we remove the starting point and then flip the paths. | ||
.map { $0.dropFirst().reversed() } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see this pattern in a few places... map, drop first, reversed. Should we consider changing the way paths are returned by DirectedGraph? Or maybe create a utility function inside of DirectedGraph+Paths.swift that does this? Understanding why we need this here without reading the entire implementation of DirectedGraph is a bit difficult. Or maybe you just need one of your ascii art examples here in a commit to make it clear what this code expects from DirectedGraph. |
||
.sorted { (lhs, rhs) -> Bool in | ||
// Order a path rooted in a technology as the canonical one. | ||
if options.contains(.preferTechnologyRoot), let first = lhs.first { | ||
return try! entity(with: first).semantic is Technology | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tangential: Is it really safe here to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The first element doesn't have to be a "technology" (tutorials table of contents page), it just needs to exist in the context. The only portion of this line that raises an error is This means that every element in every path is guaranteed to appear in Unless the topic graph node is virtual and doesn't represent a page, there should always exist a documentation node for that reference in the context. In theory it's possible that there's either a bug in the graph traversal or in the topic graph reverse edges data so that the start of the path is a virtual node but that would be a really bad bug where it's probably better to crash. |
||
} | ||
|
||
return breadcrumbsAreInIncreasingOrder(lhs, rhs) | ||
} | ||
} | ||
|
||
/// Finds the shortest finite (acyclic) path, also called "breadcrumb", to the given reference in the topic graph. | ||
/// | ||
/// The path is a list of references that describe a walk through the topic graph from a leaf node up to, but not including, the given `reference`. | ||
/// | ||
/// > Note: | ||
/// If all paths from the given reference are infinite (cycle back on themselves) then this function will return `nil`, because there are no _finite_ paths in the topic graph from that reference. | ||
/// | ||
/// - Parameter reference: The reference to find the shortest path to. | ||
/// - Returns: The shortest path to the given reference, or `nil` if all paths to the reference are infinite (contain cycles). | ||
func shortestFinitePath(to reference: ResolvedTopicReference) -> [ResolvedTopicReference]? { | ||
topicGraph.reverseEdgesGraph | ||
.shortestFinitePaths(from: reference) | ||
// Graph traversal typically happens from the starting point outwards, but the callers of `shortestFinitePaths(to:)` | ||
// expect paths going inwards from the leaves to the starting point, excluding the starting point itself. | ||
// To match the caller's expectations we remove the starting point and then flip the paths. | ||
.map { $0.dropFirst().reversed() } | ||
.min(by: breadcrumbsAreInIncreasingOrder) | ||
} | ||
|
||
/// Finds all the reachable root node references from the given reference. | ||
/// | ||
/// > Note: | ||
/// If all paths from the given reference are infinite (cycle back on themselves) then this function will return an empty set, because there are no reachable roots in the topic graph from that reference. | ||
/// | ||
/// - Parameter reference: The reference to find reachable root node references from. | ||
/// - Returns: The references of the root nodes that are reachable fro the given reference, or `[]` if all paths from the reference are infinite (contain cycles). | ||
func reachableRoots(from reference: ResolvedTopicReference) -> Set<ResolvedTopicReference> { | ||
topicGraph.reverseEdgesGraph.reachableLeafNodes(from: reference) | ||
} | ||
} | ||
|
||
/// Compares two breadcrumbs for sorting so that the breadcrumb with fewer components come first and breadcrumbs with the same number of components are sorted alphabetically. | ||
private func breadcrumbsAreInIncreasingOrder(_ lhs: [ResolvedTopicReference], _ rhs: [ResolvedTopicReference]) -> Bool { | ||
// If the breadcrumbs have the same number of components, sort alphabetically to produce stable results. | ||
guard lhs.count != rhs.count else { | ||
return lhs.map({ $0.path }).joined(separator: ",") < rhs.map({ $0.path }).joined(separator: ",") | ||
} | ||
// Otherwise, sort by the number of breadcrumb components. | ||
return lhs.count < rhs.count | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2247,7 +2247,7 @@ public class DocumentationContext: DocumentationContextDataProviderDelegate { | |
let automaticallyCurated = autoCurateSymbolsInTopicGraph() | ||
|
||
// Crawl the rest of the symbols that haven't been crawled so far in hierarchy pre-order. | ||
allCuratedReferences = try crawlSymbolCuration(in: automaticallyCurated.map(\.child), bundle: bundle, initial: allCuratedReferences) | ||
allCuratedReferences = try crawlSymbolCuration(in: automaticallyCurated.map(\.symbol), bundle: bundle, initial: allCuratedReferences) | ||
|
||
// Remove curation paths that have been created automatically above | ||
// but we've found manual curation for in the second crawl pass. | ||
|
@@ -2306,19 +2306,18 @@ public class DocumentationContext: DocumentationContextDataProviderDelegate { | |
/// call `removeUnneededAutomaticCuration(_:)` which walks the list of automatic curations and removes | ||
/// the parent <-> child topic graph relationships that have been obsoleted. | ||
/// | ||
/// - Parameter automaticallyCurated: A list of topics that have been automatically curated. | ||
func removeUnneededAutomaticCuration(_ automaticallyCurated: [(child: ResolvedTopicReference, parent: ResolvedTopicReference)]) { | ||
for pair in automaticallyCurated { | ||
let paths = pathsTo(pair.child) | ||
|
||
// Collect all current unique parents of the child. | ||
let parents = Set(paths.map({ $0.last?.path })) | ||
|
||
// Check if the topic has multiple curation paths | ||
guard parents.count > 1 else { continue } | ||
|
||
// The topic has been manually curated, remove the automatic curation now. | ||
topicGraph.removeEdge(fromReference: pair.parent, toReference: pair.child) | ||
/// - Parameter automaticallyCurated: A list of automatic curation records. | ||
func removeUnneededAutomaticCuration(_ automaticallyCurated: [AutoCuratedSymbolRecord]) { | ||
// It might look like it would be correct to check `topicGraph.nodes[symbol]?.isManuallyCurated` here, | ||
// but that would incorrectly remove the only parent if the manual curation and the automatic curation was the same. | ||
// | ||
// Similarly, it might look like it would be correct to only check `parents(of: symbol).count > 1` here, | ||
// but that would incorrectly remove the automatic curation for symbols with different language representations with different parents. | ||
for (symbol, parent, counterpartParent) in automaticallyCurated where parents(of: symbol).count > (counterpartParent != nil ? 2 : 1) { | ||
topicGraph.removeEdge(fromReference: parent, toReference: symbol) | ||
if let counterpartParent { | ||
topicGraph.removeEdge(fromReference: counterpartParent, toReference: symbol) | ||
} | ||
} | ||
} | ||
|
||
|
@@ -2359,20 +2358,39 @@ public class DocumentationContext: DocumentationContextDataProviderDelegate { | |
} | ||
} | ||
|
||
typealias AutoCuratedSymbolRecord = (symbol: ResolvedTopicReference, parent: ResolvedTopicReference, counterpartParent: ResolvedTopicReference?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for naming this tuple to simplify all the type annotations below... |
||
|
||
/// Curate all remaining uncurated symbols under their natural parent from the symbol graph. | ||
/// | ||
/// This will include all symbols that were not manually curated by the documentation author. | ||
/// - Returns: An ordered list of symbol references that have been added to the topic graph automatically. | ||
private func autoCurateSymbolsInTopicGraph() -> [(child: ResolvedTopicReference, parent: ResolvedTopicReference)] { | ||
var automaticallyCuratedSymbols = [(ResolvedTopicReference, ResolvedTopicReference)]() | ||
linkResolver.localResolver.traverseSymbolAndParentPairs { reference, parentReference in | ||
private func autoCurateSymbolsInTopicGraph() -> [AutoCuratedSymbolRecord] { | ||
var automaticallyCuratedSymbols = [AutoCuratedSymbolRecord]() | ||
linkResolver.localResolver.traverseSymbolAndParents { reference, parentReference, counterpartParentReference in | ||
guard let topicGraphNode = topicGraph.nodeWithReference(reference), | ||
let topicGraphParentNode = topicGraph.nodeWithReference(parentReference), | ||
// Check that the node hasn't got any parents from manual curation | ||
// Check that the node isn't already manually curated | ||
!topicGraphNode.isManuallyCurated | ||
else { return } | ||
|
||
// Check that the symbol doesn't already have parent's that aren't either language representation's hierarchical parent. | ||
// This for example happens for default implementation and symbols that are requirements of protocol conformances. | ||
guard parents(of: reference).allSatisfy({ $0 == parentReference || $0 == counterpartParentReference }) else { | ||
return | ||
} | ||
|
||
guard let topicGraphParentNode = topicGraph.nodeWithReference(parentReference) else { | ||
preconditionFailure("Node with reference \(parentReference.absoluteString) exist in link resolver but not in topic graph.") | ||
} | ||
topicGraph.addEdge(from: topicGraphParentNode, to: topicGraphNode) | ||
automaticallyCuratedSymbols.append((child: reference, parent: parentReference)) | ||
|
||
if let counterpartParentReference { | ||
guard let topicGraphCounterpartParentNode = topicGraph.nodeWithReference(counterpartParentReference) else { | ||
preconditionFailure("Node with reference \(counterpartParentReference.absoluteString) exist in link resolver but not in topic graph.") | ||
} | ||
topicGraph.addEdge(from: topicGraphCounterpartParentNode, to: topicGraphNode) | ||
} | ||
// Collect a single automatic curation record for both language representation parents. | ||
automaticallyCuratedSymbols.append((reference, parentReference, counterpartParentReference)) | ||
} | ||
return automaticallyCuratedSymbols | ||
} | ||
|
@@ -2736,15 +2754,9 @@ public class DocumentationContext: DocumentationContextDataProviderDelegate { | |
return topicGraph.nodes[reference]?.title | ||
} | ||
|
||
/** | ||
Traverse the Topic Graph breadth-first, starting at the given reference. | ||
*/ | ||
func traverseBreadthFirst(from reference: ResolvedTopicReference, _ observe: (TopicGraph.Node) -> TopicGraph.Traversal) { | ||
guard let node = topicGraph.nodeWithReference(reference) else { | ||
return | ||
} | ||
|
||
topicGraph.traverseBreadthFirst(from: node, observe) | ||
/// Returns a sequence that traverses the topic graph in breadth first order from a given reference, without visiting the same node more than once. | ||
func breadthFirstSearch(from reference: ResolvedTopicReference) -> some Sequence<TopicGraph.Node> { | ||
topicGraph.breadthFirstSearch(from: reference) | ||
} | ||
|
||
/** | ||
|
@@ -2866,55 +2878,6 @@ public class DocumentationContext: DocumentationContextDataProviderDelegate { | |
.map { $0.reference } | ||
} | ||
|
||
/// Options to consider when producing node breadcrumbs. | ||
struct PathOptions: OptionSet { | ||
let rawValue: Int | ||
|
||
/// The node is a technology page; sort the path to a technology as canonical. | ||
static let preferTechnologyRoot = PathOptions(rawValue: 1 << 0) | ||
} | ||
|
||
/// Finds all paths (breadcrumbs) to the given node reference. | ||
/// | ||
/// Each path is an array of references to the symbols from the module symbol to the current one. | ||
/// The first path in the array is always the canonical path to the symbol. | ||
/// | ||
/// - Parameters: | ||
/// - reference: The reference to build that paths to. | ||
/// - currentPathToNode: Used for recursion - an accumulated path to "continue" working on. | ||
/// - Returns: A list of paths to the current reference in the topic graph. | ||
func pathsTo(_ reference: ResolvedTopicReference, currentPathToNode: [ResolvedTopicReference] = [], options: PathOptions = []) -> [[ResolvedTopicReference]] { | ||
let nodeParents = parents(of: reference) | ||
guard !nodeParents.isEmpty else { | ||
// The path ends with this node | ||
return [currentPathToNode] | ||
} | ||
var results = [[ResolvedTopicReference]]() | ||
for parentReference in nodeParents { | ||
let parentPaths = pathsTo(parentReference, currentPathToNode: [parentReference] + currentPathToNode) | ||
results.append(contentsOf: parentPaths) | ||
} | ||
|
||
// We are sorting the breadcrumbs by the path distance to the documentation root | ||
// so that the first element is the shortest path that we are using as canonical. | ||
results.sort { (lhs, rhs) -> Bool in | ||
// Order a path rooted in a technology as the canonical one. | ||
if options.contains(.preferTechnologyRoot), let first = lhs.first { | ||
return try! entity(with: first).semantic is Technology | ||
} | ||
|
||
// If the breadcrumbs have equal amount of components | ||
// sort alphabetically to produce stable paths order. | ||
guard lhs.count != rhs.count else { | ||
return lhs.map({ $0.path }).joined(separator: ",") < rhs.map({ $0.path }).joined(separator: ",") | ||
} | ||
// Order by the length of the breadcrumb. | ||
return lhs.count < rhs.count | ||
} | ||
|
||
return results | ||
} | ||
|
||
func dumpGraph() -> String { | ||
return topicGraph.nodes.values | ||
.filter { parents(of: $0.reference).isEmpty } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a Swift function
enumerated()
you could use here, maybe off by 1? Why the explicit call to zip with the range?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
enumerated()
function works very similar to this but it starts at 0. I could have used enumerated and then donelet indexAfter = index + 1
in the loop body but that would introduce a slight risk of usingindex
instead ofindexAfter
which is avoided by only having one index variable in the loop body.