Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance when using large files #2656

Open
AndreasArvidsson opened this issue Aug 29, 2024 · 0 comments
Open

Improve performance when using large files #2656

AndreasArvidsson opened this issue Aug 29, 2024 · 0 comments

Comments

@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Aug 29, 2024

When using large files. eg a json file with a 100 000 lines our scopes that looks at the entire file are quite slow. That two main cases for this is:

  1. Tree sitter scope handler
  2. Surrounding pair scope handler

The surrounding pair scope handler could be improved by parsing the file in chunks. Initially I been focused on debugging the Tree sitter scope handler and its matches function.

matches(
document: TextDocument,
start?: Position,
end?: Position,
): QueryMatch[] {
return this.query
.matches(this.treeSitter.getTree(document).rootNode, {
startPosition: start == null ? undefined : positionToPoint(start),
endPosition: end == null ? undefined : positionToPoint(end),
})
.map(
({ pattern, captures }): MutableQueryMatch => ({
patternIdx: pattern,
captures: captures.map(({ name, node }) => ({
name,
node,
document,
range: getNodeRange(node),
insertionDelimiter: undefined,
allowMultiple: false,
hasError: () => isContainedInErrorNode(node),
})),
}),
)
.filter((match) =>
this.patternPredicates[match.patternIdx].every((predicate) =>
predicate(match),
),
)
.map((match): QueryMatch => {
// Merge the ranges of all captures with the same name into a single
// range and return one capture with that name. We consider captures
// with names `@foo`, `@foo.start`, and `@foo.end` to have the same
// name, for which we'd return a capture with name `foo`.
const captures: QueryCapture[] = Object.entries(
groupBy(match.captures, ({ name }) => normalizeCaptureName(name)),
).map(([name, captures]) => {
captures = rewriteStartOfEndOf(captures);
const capturesAreValid = checkCaptureStartEnd(
captures,
ide().messages,
);
if (!capturesAreValid && ide().runMode === "test") {
throw new Error("Invalid captures");
}
return {
name,
range: captures
.map(({ range }) => range)
.reduce((accumulator, range) => range.union(accumulator)),
allowMultiple: captures.some((capture) => capture.allowMultiple),
insertionDelimiter: captures.find(
(capture) => capture.insertionDelimiter != null,
)?.insertionDelimiter,
hasError: () => captures.some((capture) => capture.hasError()),
};
});
return { ...match, captures };
});
}

I added a time log between each step in the return statement. The below is the result for "take key" in a 26MB json file.

TreeSitterQuery.matches: query.matches: 4.579s
TreeSitterQuery.matches: map1: 3.303s
TreeSitterQuery.matches: filter: 160.087ms
TreeSitterQuery.matches: map2: 4.269s
TreeSitterQuery.matches: 12.313s
Tree sitter generateScopeCandidates: 12.917s
collectionKey getContainingScopeTarget: 12.985s

It's clear that the Tree sitter query match takes some time. This could be improved by not actually checking all patterns/captures and instead focusing on the one that's actually match to the requested scope type. It's also clear that our post processing in typescript is quite slow. I'm referring to the two map stages.

@AndreasArvidsson AndreasArvidsson changed the title Improved performance when using large files Improve performance when using large files Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant