Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Attach] Allow semantic searching content node documents #11516

Merged
merged 4 commits into from
Mar 21, 2025

Conversation

philipperolet
Copy link
Contributor

Fixes #2404

Description

As per title and issue.

Extracted from #11489 (cf comments)

Risks

standard

Deploy

front

Fixes #2404

Description
---
As per title and issue.

Extracted from #11489 (cf comments)

Risks
---
standard

Deploy
---
front
@philipperolet philipperolet requested a review from spolu March 21, 2025 08:29
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand.

I thought we wanted to have one action per attached folders?

Is the intent here to allow searching in attached files? Aren't we supposed to change the isSearchable in the list files as well?

How will that work when we start allowing folders?

dataSourceViewId: f.nodeDataSourceViewId,
filter: { parents: null, tags: null },
}));
if (dataSourceView) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this one to make it clear it's the conversation data source right?

.map((f) => ({
workspaceId: auth.getNonNullableWorkspace().sId,
dataSourceViewId: f.nodeDataSourceViewId,
filter: { parents: null, tags: null },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're taking the whole dataSourceView?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I should have the content node id as parent filter, thanks for noticing 👍

@philipperolet
Copy link
Contributor Author

Not sure I understand.

I thought we wanted to have one action per attached folders?

Is the intent here to allow searching in attached files? Aren't we supposed to change the isSearchable in the list files as well?

How will that work when we start allowing folders?

So, indeed, let me explain how I see it:

Previous behaviour, before Attach project:

  • we can include any single document (via list files and the conversation_include_file action)
  • we can semantic search the whole conversation

New behaviour, before folders: the same.
Docs can be content nodes, but the "we can sem search the whole conversation" should work
=> this is the current PR.

New behaviour, after folders: the same:

  • we can include any single document
  • we can include any single folder (thus the 1 sem search action per folder)
  • we can semantic search the whole convo

As for listFiles, it doesn't need an update. Content nodes doc were already marked as searchable, but since we didn't add their datasourceview in the global semantic search action, they weren't included in that search (that's what this PR solves)

const dataSources: DataSourceConfiguration[] =
filesUsableAsRetrievalQuery
.filter((f) => isConversationContentNodeType(f))
// For each searchable content node, we add its datasourceview with itself as parent filter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could group by datasourceview here but not sure the gain (which seems ~0 to me) is worth the added LOCs and complexity

@philipperolet philipperolet requested a review from spolu March 21, 2025 09:24
@spolu
Copy link
Contributor

spolu commented Mar 21, 2025

New behaviour, after folders: the same:

we can include any single document
we can include any single folder (thus the 1 sem search action per folder)
we can semantic search the whole convo

When you say:

we can include any single folder (thus the 1 sem search action per folder)

It's more we can search no?

As for listFiles, it doesn't need an update. Content nodes doc were already marked as searchable, but since we didn't add their datasourceview in the global semantic search action, they weren't included in that search (that's what this PR solves)

roger that

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@philipperolet
Copy link
Contributor Author

New behaviour, after folders: the same:

we can include any single document
we can include any single folder (thus the 1 sem search action per folder)
we can semantic search the whole convo

When you say:

we can include any single folder (thus the 1 sem search action per folder)

It's more we can search no?

yes, search indeed

@philipperolet philipperolet merged commit 809c37c into main Mar 21, 2025
6 checks passed
@philipperolet philipperolet deleted the add-semsearch branch March 21, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input Bar can take the whole screen
2 participants