You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 11, 2024. It is now read-only.
@kbenoit Thank you for the suggestion.
There is a reason for starting counting from 1; that's because a number in the DFM is the sum of (1/proximity) by default. And of course, 1/0 is Inf.
One can either change the weight_function for dfm(), or change count_from for tokens_proximity().
library(quanteda); library(quanteda.proximity)
#> Package version: 3.3.1#> Unicode version: 14.0#> ICU version: 70.1#> Parallel computing: 8 of 8 threads used.#> See https://quanteda.io for tutorials and examples.toks<- tokens(c(d1="a b c d e", d2="c d e"))
toksp<- tokens_proximity(toks, "b", count_from=0)
toksp$proximity#> $d1#> [1] 1 0 1 2 3#> #> $d2#> [1] 3 3 3
When get_min (get the row minimum) is FALSE, it gives a matrix (I realize now, the columns should be named; and consistent in the number of columns. I admit that that I didn't pay enough attention to that in the development so far). As explained in the documentation, the numbers in the matrix won't add count_from to them.
library(quanteda); library(quanteda.proximity)
#> Package version: 3.3.1#> Unicode version: 14.0#> ICU version: 70.1#> Parallel computing: 8 of 8 threads used.#> See https://quanteda.io for tutorials and examples.toks<- tokens(c(d1="a b c d e", d2="c d e"))
toksp<- tokens_proximity(toks, pattern="b|c", valuetype="regex", get_min=FALSE)
toksp$proximity#> $d1#> [,1] [,2]#> [1,] 1 2#> [2,] 0 1#> [3,] 1 0#> [4,] 2 1#> [5,] 3 2#> #> $d2#> [,1]#> [1,] 0#> [2,] 1#> [3,] 2
Right now, it's 1, and the token adjacent to it is 2. Seems like these should be 0 and 1.
And this could be interpreted as inconsistent if there are multiple matches, since adjacent tokens are now 1 from each other:
Created on 2023-11-17 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: