Consider avoiding new tokens child class #35

kbenoit · 2023-11-17T17:17:40Z

Right now, there is a single pattern for each proximity computation, stored as the docvar proximity that is a list of distances. This has some drawbacks.

It allows a single proximity distance for each document, with a hardwired pattern - even when that pattern is not fixed and therefore could match multiple elements. This would be similar to implementing kwic() in this way, rather than generating a new object with a different structure.
Having proximities as list elements for a tokens type means these can become wrong if the tokens are modified in any way, through tokens_wordstem/tolower/remove() etc. (A solution like kwic() to generate a new object would avoid this.)

The text was updated successfully, but these errors were encountered:

chainsawriot · 2023-11-17T19:59:48Z

@kbenoit Thank you for the suggestion. This is also the reason for #33. I think your suggestion for making tokens_with_proximity a class of its own (not a sister of tokens) is really good.

Make tokens_with_proximity a class of its own

koheiw · 2023-11-18T01:35:16Z

I like the way you saved additional information in a list column, but it is true that changes in the token object breaks it. It worth considering token-level meta field official, but you should make it impossible to apply tokens_select/remove for now by giving a class like tokens_proximity or something.

chainsawriot · 2023-11-20T10:41:22Z

Make sure the unique tokens_with_proximity do not work with tokens_select() and friends
Provide tokens.tokens_with_proximity() to convert an object back to tokens for further manipulation

chainsawriot · 2023-11-20T10:46:41Z

Add a tolower option to tokens_proximity and record it in metadata (default to TRUE?). So that when processing the hardcoded dfm(tolower), we don't need to tolower again. ref Make it 100% compatible with quanteda #27

chainsawriot · 2023-11-20T10:55:55Z

docvars.tokens_with_proximity methods
meta.tokens_with_proximity methods

chainsawriot · 2023-11-20T11:37:53Z

tokens_proximity() works with tokens_with_proxmity object

* Make the class unique and add several methods ref #35 * Make tokens_proxitmity() still work for changing keywords [no ci] * Update Doc [no ci]

* Make `tolower` default for tokens_proximity() * Update README

chainsawriot added a commit that referenced this issue Nov 20, 2023

Make the class unique and add several methods ref #35

50f018e

chainsawriot added a commit that referenced this issue Nov 20, 2023

Make the class unique and add several methods ref #35 (#42)

3fdd505

* Make the class unique and add several methods ref #35 * Make tokens_proxitmity() still work for changing keywords [no ci] * Update Doc [no ci]

chainsawriot added a commit that referenced this issue Nov 20, 2023

Add tests ref #35

767b580

chainsawriot added a commit that referenced this issue Nov 20, 2023

Add tolower ref #35 (#43)

f324e6a

* Make `tolower` default for tokens_proximity() * Update README

chainsawriot mentioned this issue Nov 26, 2023

Own class: tokens_with_tokenvars gesistsa/tokenvars#2

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider avoiding new tokens child class #35

Consider avoiding new tokens child class #35

kbenoit commented Nov 17, 2023

chainsawriot commented Nov 17, 2023 •

edited

Loading

koheiw commented Nov 18, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

Consider avoiding new tokens child class #35

Consider avoiding new tokens child class #35

Comments

kbenoit commented Nov 17, 2023

chainsawriot commented Nov 17, 2023 • edited Loading

koheiw commented Nov 18, 2023 • edited Loading

chainsawriot commented Nov 20, 2023 • edited Loading

chainsawriot commented Nov 20, 2023 • edited Loading

chainsawriot commented Nov 20, 2023 • edited Loading

chainsawriot commented Nov 20, 2023 • edited Loading

chainsawriot commented Nov 17, 2023 •

edited

Loading

koheiw commented Nov 18, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading

chainsawriot commented Nov 20, 2023 •

edited

Loading