Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance search API #3658

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

matthias314
Copy link
Contributor

@matthias314 matthias314 commented Feb 8, 2025

This PR is based on #3575 and therefore a draft at present. It has the following components:

  • additional methods for searching text, for example methods that also return matched capturing groups,
  • a new type RegexpGroup that combines a regexp with its padded versions as used in match beginning and end of line correctly #3575,
  • process Deltas in ExecuteTextEvent in reverse order. This makes replaceall easier to implement,
  • new functions LocVoid() and Loc.IsVoid() to deal with unused submatches.

The new types and functions are as follows (UPDATED):

// NewRegexpGroup creates a RegexpGroup from a string
func NewRegexpGroup(s string) (RegexpGroup, error)

// FindDown returns a slice containing the start and end positions
// of the first match of `rgrp` between `start` and `end`, or nil
// if no match exists.
func (b *Buffer) FindDown(rgrp RegexpGroup, start, end Loc) []Loc

// FindDownSubmatch returns a slice containing the start and end positions
// of the first match of `rgrp` between `start` and `end` plus those
// of all submatches (capturing groups), or nil if no match exists.
func (b *Buffer) FindDownSubmatch(rgrp RegexpGroup, start, end Loc) []Loc

// FindUp returns a slice containing the start and end positions
// of the last match of `rgrp` between `start` and `end`, or nil
// if no match exists.
func (b *Buffer) FindUp(rgrp RegexpGroup, start, end Loc) []Loc

// FindUpSubmatch returns a slice containing the start and end positions
// of the last match of `rgrp` between `start` and `end` plus those
// of all submatches (capturing groups), or nil if no match exists.
func (b *Buffer) FindUpSubmatch(rgrp RegexpGroup, start, end Loc) []Loc

// FindAllFunc calls the function `f` once for each match between `start`
// and `end` of the regexp given by `s`. The argument of `f` is the slice
// containing the start and end positions of the match. FindAllFunc returns
// the number of matches plus any error that occured when compiling the regexp.
func (b *Buffer) FindAllFunc(s string, start, end Loc, f func([]Loc)) (int, error)

// FindAll returns a slice containing the start and end positions of all
// matches between `start` and `end` of the regexp given by `s`, plus any
// error that occured when compiling the regexp. If no match is found, the
// slice returned is nil.
func (b *Buffer) FindAll(s string, start, end Loc) ([][]Loc, error)

// FindAllSubmatchFunc calls the function `f` once for each match between
// `start` and `end` of the regexp given by `s`. The argument of `f` is the
// slice containing the start and end positions of the match and all submatches
// (capturing groups). FindAllSubmatch Func returns the number of matches plus
// any error that occured when compiling the regexp.
func (b *Buffer) FindAllSubmatchFunc(s string, start, end Loc, f func([]Loc)) (int, error)

// FindAllSubmatch returns a slice containing the start and end positions of
// all matches and all submatches (capturing groups) between `start` and `end`
// of the regexp given by `s`, plus any error that occured when compiling
// the regexp. If no match is found, the slice returned is nil.
func (b *Buffer) FindAllSubmatch(s string, start, end Loc) ([][]Loc, error)

// ReplaceAll replaces all matches of the regexp `s` in the given area. The
// new text is obtained from `template` by replacing each variable with the
// corresponding submatch as in `Regexp.Expand`. The function returns the
// number of replacements made, the new end position and any error that
// occured during regexp compilation
func (b *Buffer) ReplaceAll(s string, start, end Loc, template []byte) (int, Loc, error)

// ReplaceAllLiteral replaces all matches of the regexp `s` with `repl` in
// the given area. The function returns the number of replacements made, the
// new end position and any error that occured during regexp compilation
func (b *Buffer) ReplaceAllLiteral(s string, start, end Loc, repl []byte) (int, Loc, error)

// ReplaceAllFunc replaces all matches of the regexp `s` with `repl(match)`
// in the given area, where `match` is the slice containing start and end
// positions of the match. The function returns the number of replacements
// made, the new end position and any error that occured during regexp
// compilation
func (b *Buffer) ReplaceAllFunc(s string, start, end Loc, repl func(match []Loc) []byte) (int, Loc, error)

// ReplaceAllSubmatchFunc replaces all matches of the regexp `s` with
// `repl(match)` in the given area, where `match` is the slice containing
// start and end positions of the match and all submatches. The function
// returns the number of replacements made, the new end position and any
// error that occured during regexp compilation
func (b *Buffer) ReplaceAllSubmatchFunc(s string, start, end Loc, repl func(match []Loc) []byte) (int, Loc, error)

// MatchedStrings converts a slice containing start and end positions of
// matches or submatches to a slice containing the corresponding strings.
func (b *Buffer) MatchedStrings(locs []Loc) ([]string)

// LocVoid returns a Loc strictly smaller then any valid buffer location
func LocVoid() Loc

// IsVoid returns true if the location l is void
func (l Loc) IsVoid() bool

The method FindNext is kept. ReplaceRegex is removed in favor of ReplaceAll. The latter is easier to use in Lua scripts.

Currently the simple search functions (FindDown etc.) take a RegexpGroup as argument to avoid recompiling the regexps. In contrast, FindAll, ReplaceAll and friends take a string argument. Many other variants would be possible. Also, the new search functions ignore the ignorecase setting of the buffer and don't wrap around when they hit the end of the search region. I think they are more useful this way in Lua scripts.

You will see that many new internal functions use callback functions. This avoids code duplication. (One has to somehow switch between (*regexp.Regexp).FindIndex() and (*regexp.Regexp).FindSubmatchIndex() in the innermost function that searches each line of the buffer.)

As said before, many details could be modified, but overall I think these functions are very useful for writing scripts. Please let me know what you think.

@matthias314 matthias314 force-pushed the m3/find-func branch 2 times, most recently from 8b80291 to 92b6fba Compare February 9, 2025 17:11
@matthias314
Copy link
Contributor Author

I've rebased the PR onto master and added NewRegexpGroup to the documentation.

@matthias314 matthias314 marked this pull request as ready for review February 9, 2025 17:40
@matthias314
Copy link
Contributor Author

matthias314 commented Feb 9, 2025

The latest commit fixes a subtle bug related to the padding of the search region: In the presence of combining characters, one could end up with an infinite loop. (Try searching backwards for . in the line x⃗y⃗z⃗.)

This bug is also present in #3575, hence in master. If you want, I can backport 88f3cf5 to master. This would require some modification of the commit, so let me know if that's necessary.

@matthias314 matthias314 force-pushed the m3/find-func branch 2 times, most recently from 0d14eae to 88f3cf5 Compare February 10, 2025 00:05
@matthias314
Copy link
Contributor Author

I've force-pushed a polished version and updated the list of functions at the top of this page. It still fixes the bug mentioned above. Also, locations returned for matches and submatches are now guaranteed to include the runes that matched. The underlying Go regexp functions match runes, which may be part of combining characters like x⃗. The start and end locations now are such that the characters between them include all matching runes. This is not the case on master. (Search backwards for . in a row consisting of many x⃗ to see the difference.)

This PR introduces many new functions. Maybe we don't need all of them. For example, do we need a submatch version on top of each non-submatch search or replace function? If we keep only the submatch version, we wouldn't lose any functionality because the additional elements in the slice for each match can be ignored. I nevertheless added all functions to this PR to show what's possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant