New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Assistants] Filter on names #841
Conversation
@@ -34,6 +35,7 @@ export const load = async ({ url, locals }) => { | |||
...(modelId && { modelId }), | |||
...(!createdByCurrentUser && { userCount: { $gt: 1 } }), | |||
...(user ? { createdById: user._id } : { featured: true }), | |||
...(query && { name: { $regex: query, $options: "i" } }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this filter change, I've created this index https://github.com/huggingface/chat-ui/blob/assistants_search/src/lib/server/database.ts#L88
Please let me know if it is enough (cc: @coyotte508 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to efficiently search, besides the $text
index (more suited to full text), if you want to do it in mongo you need prefixed case-sensitive regexes, and store search tokens.
Eg we have this in moon-landing:
generateSearchTokens(str: string): string[] {
const lcase = str.toLowerCase().trim();
return [
...new Set(
[
lcase.replace(/[-._\s/]/g, ""),
...lcase.split("/").map(x => x.replace(/[-._\s]/g, "")),
...lcase.replace(/[-.\s_/]/g, " ").split(" "),
...lcase
.replace(/[-\s_/]/g, " ")
.split(" ")
.map(x => x.replace(/\./g, "")),
/// ^ do not split on `.` but still remove them from search tokens (simpler for search)
/// allows for searches like "2.7B" to return things like "EleutherAI/gpt-neo-2.7B"
].filter(x => !!x)
),
];
}
You can apply this to the name - probably tweaking it a bit since the strings we process are a bit different - and store the search tokens.
Then create an index on searchTokens: 1, userCount: -1
Then the search, you do { searchTokens: { $all: q.split(/[-._/\s]+/).map(makeRegex) } }
with
const makeRegex = (s: string) => new RegExp("^" + Utils.escapeForRegExp(s));
(which prefixes the regex by using ^
)
At least that's how we do it for the hub. Alterative to this and $text
is using meilisearch 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we plan to only filter on assistant.name
. Would this filter (query && { name: { $regex: query, $options: "i" } }),
be too slow considering that we have 10k+ assistants atm ?
We can definitely create/use moon-landing-like searchTokens
if we also want to include assistant.description
besides assistant.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
searchTokens
are suitable for assistant.name. If we included description we'd have to move to a $text
index instead.
We use searchTokens
on the hub to search org names, model names, user names etc
Would this filter (query && { name: { $regex: query, $options: "i" } }), be too slow considering that we have 10k+ assistants atm ?
As long as the follow up PR is not forgotten - but may as well do it right the first time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gary149 wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coyotte508 the question is:
- Can we start with this simple
name: { $regex: query, $options: "i" }
only onassistant.name
without having to createsearchTokens
? By can we, I mean: would the performance be good enough for the number of assistants we have? Or we don't know how the performance is gonna be without trying it empirically? - However, if we have to use
searchTokens
, then it would make sense to addassistant.description
tosearchTokens
as well asassistant.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO:
- Search name only: use
searchTokens
- Search name & description: use a text index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's use searchTokens
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handed in #844
using text index on assinstats.name
(fi we wanna add assinstats.description
as well, it would be a simple change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready for review
const filterOnName = debounce(async (e: Event) => { | ||
const value = (e.target as HTMLInputElement).value; | ||
const newUrl = getHref($page.url, { newKeys: { q: value } }); | ||
await goto(newUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking advantage of svelte partial hydration without creating explicit API routes for now
src/routes/assistants/+page.svelte
Outdated
<div | ||
class="ml-auto mr-2 flex h-9 w-9 items-center gap-x-2 rounded-full border border-gray-300 px-2 transition-all duration-300 ease-out hover:w-64 md:w-64 dark:border-gray-500" | ||
> | ||
<button class="flex-shrink-0 cursor-default" on:click={() => filterInputEl.focus()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this filterInputEl.focus()
trick is needed for the small/mobiel screen
Co-authored-by: Eliott C. <[email protected]>
<button class="flex-shrink-0 cursor-default" on:click={() => filterInputEl.focus()} | ||
><CarbonSearch class="text-gray-400" /></button | ||
> | ||
<input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get weird behaviour (undefined is the default value then it's not obvious that there're results)
Screen.Recording.2024-02-20.at.14.07.59.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this looks good for now after testing locally! I'd also like to be able to search on substrings and not necessarily whole words but it can come later
superceded by #873 after talking with @gary149 , we decided to use |
yes good call imo |
Filter with `searchTokens`
As suggested here, merged #873 into this branch & will continue from this branch/PR TODOS:
|
Order of steps?
|
TLDR: Filter on assistant's names
Screen.Recording.2024-02-19.at.4.59.19.PM.mov
Tried to make a bit of a nicer UX on mobile using tailwind transitions
screen-20240219-161717.mp4