This repository has been archived by the owner on Feb 11, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Consider avoiding new tokens child class #35
Comments
I like the way you saved additional information in a list column, but it is true that changes in the token object breaks it. It worth considering token-level meta field official, but you should make it impossible to apply |
|
|
|
chainsawriot
added a commit
that referenced
this issue
Nov 20, 2023
|
chainsawriot
added a commit
that referenced
this issue
Nov 20, 2023
chainsawriot
added a commit
that referenced
this issue
Nov 20, 2023
4 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Right now, there is a single pattern for each proximity computation, stored as the docvar
proximity
that is a list of distances. This has some drawbacks.It allows a single proximity distance for each document, with a hardwired pattern - even when that pattern is not fixed and therefore could match multiple elements. This would be similar to implementing
kwic()
in this way, rather than generating a new object with a different structure.Having proximities as list elements for a tokens type means these can become wrong if the tokens are modified in any way, through
tokens_wordstem/tolower/remove()
etc. (A solution like kwic() to generate a new object would avoid this.)The text was updated successfully, but these errors were encountered: