This repository was archived by the owner on Feb 11, 2024. It is now read-only.
This repository was archived by the owner on Feb 11, 2024. It is now read-only.
Should a token's proximity to itself be 1 or 0? #34
Open
Description
Right now, it's 1, and the token adjacent to it is 2. Seems like these should be 0 and 1.
library("quanteda")
#> Package version: 4.0.0
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: 12 of 12 threads used.
#> See https://quanteda.io for tutorials and examples.
library("quanteda.proximity")
toks <- tokens(c(d1 = "a b c d e", d2 = "c d e"))
toksp <- tokens_proximity(toks, "b")
toksp$proximity
#> $d1
#> [1] 2 1 2 3 4
#>
#> $d2
#> [1] 4 4 4
And this could be interpreted as inconsistent if there are multiple matches, since adjacent tokens are now 1 from each other:
> tokens_proximity(toks, pattern = "b|c", valuetype = "regex")$proximity
$d1
[1] 2 1 1 2 3
$d2
[1] 1 2 3
Created on 2023-11-17 with reprex v2.0.2
Metadata
Metadata
Assignees
Labels
No labels