-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search is case sensitive for non-ASCII messages #5052
Comments
that would double the space needed for stored text, maybe okay when compared to what blobs need. what would be another consideration for having an extra field for searching is that we could also add other things there - as names of files or webxdc apps - currently, you cannot search for that. appending these information to that said, there is also FULLTEXT and ICU extensions, that may be alternatives (ICU is mentioned in the linked post to be complicated, however, not sure if that also applied to rust). for FULLTEXT, it is still astonishing how far and fast we can go without |
It is complicated, we currently just use vendored SQLCipher and managing our own version of SQLite with SQLCipher and additional extensions is going to make compiling the core more difficult for everyone. There is an issue in |
This migration passes by the way:
So we can use https://www.sqlite.org/fts5.html |
Tried FTS5, but this approach has its own problems. We should use MATCH instead of LIKE to perform a case-insensitive search, but then we lose a capability of matching against parts of words (if we use the unicode61 tokeniser). We can use the
I think the most simple way that doesn't break the current UX also is what @link2xt suggested initially:
Moreover, if a normalised text doesn't differ from the source one or contains only ASCII, we can omit storing it thus reducing the db size. EDIT: Note that matching against parts of words is critical for CJK langs, i don't know if there are any tokenisers for them. |
SQLite search with `LIKE` is case-insensitive only for ASCII chars. To make it case-insensitive for all messages, create a new column `msgs.txt_normalized` defaulting to `NULL` (so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created/updated. When doing a search, search over `IFNULL(txt_normalized, txt)`.
SQLite search with `LIKE` is case-insensitive only for ASCII chars. To make it case-insensitive for all messages, create a new column `msgs.txt_normalized` defaulting to `NULL` (so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created/updated. When doing a search, search over `IFNULL(txt_normalized, txt)`.
SQLite search with `LIKE` is case-insensitive only for ASCII chars. To make it case-insensitive for all messages, create a new column `msgs.txt_normalized` defaulting to `NULL` (so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created/updated. When doing a search, search over `IFNULL(txt_normalized, txt)`.
SQLite search with `LIKE` is case-insensitive only for ASCII chars. To make it case-insensitive for all messages, create a new column `msgs.txt_normalized` defaulting to `NULL` (so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created/updated. When doing a search, search over `IFNULL(txt_normalized, txt)`.
SQLite search with `LIKE` is case-insensitive only for ASCII chars. To make it case-insensitive for all messages, create a new column `msgs.txt_normalized` defaulting to `NULL` (so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created/updated. When doing a search, search over `IFNULL(txt_normalized, txt)`.
In SQLite search with
LIKE
is case-insensitive:deltachat-core-rust/src/context.rs
Line 933 in b779d08
This however does not work for non-english letters. Using functions like
UPPER
also does not help as they only work for ASCII and using a function ontxt
column will prevent using indexes anyway.It is a well-known problem:
https://shallowdepth.online/posts/2022/01/5-ways-to-implement-case-insensitive-search-in-sqlite-with-full-unicode-support/
The solution is to create a new column
txt_normalized
defaulting toNULL
(so we do not bump up the database size in a migration) and storing lowercased/normalized text there when the row is created. When doing a search, search overIFNULL(txt_normalized, txt)
.The text was updated successfully, but these errors were encountered: