Should a token's proximity to itself be 1 or 0?

Right now, it's 1, and the token adjacent to it is 2. Seems like these should be 0 and 1.

``` r
library("quanteda")
#> Package version: 4.0.0
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: 12 of 12 threads used.
#> See https://quanteda.io for tutorials and examples.
library("quanteda.proximity")
toks <- tokens(c(d1 = "a b c d e", d2 = "c d e"))
toksp <- tokens_proximity(toks, "b")
toksp$proximity
#> $d1
#> [1] 2 1 2 3 4
#> 
#> $d2
#> [1] 4 4 4
```

And this could be interpreted as inconsistent if there are multiple matches, since adjacent tokens are now 1 from each other:
```r
> tokens_proximity(toks, pattern = "b|c", valuetype = "regex")$proximity
$d1
[1] 2 1 1 2 3

$d2
[1] 1 2 3
```

<sup>Created on 2023-11-17 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should a token's proximity to itself be 1 or 0? #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should a token's proximity to itself be 1 or 0? #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions