-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[Question]: Retrieve documents instead of databases #6609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To retrieve a document with a specific ID using search_doc_id = ['doc_id']
response = rag_object.retrieve(question="What is attention?", dataset_ids=None, document_ids=search_doc_id) If you are still encountering an exception, double-check that the document ID you are using is correct and exists in your system [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
If you want to list all the chunk of a given doc, you could refer to this. |
Thank you for your reply. I need to retrieve only the documents with the specified id in the dataset, and no other documents participate in the retrieval. How to achieve this? |
Do not leave |
Hello, thank you for your reply. I tried to retrieve only a document in the dataset, but from the result returned by the |
Hello, thank you for your reply. I tried to retrieve only a document in the dataset, but from the result returned by the RAGFlow.retrieve fuctioon, it seems that other documents have also been retrieved. What is the reason? Here is my code |
Hello, I have the same issue. Here's my code: chunks = ragflow.retrieve(
question="what is the Notch1 Amplification related drugs and its prognosis?",
dataset_ids=['29d75a781a7511f083190242ac180006'],
document_ids=['c89d701e1a7c11f088e80242ac180006'],
page=1,
page_size=30,
similarity_threshold=0,
vector_similarity_weight=0.3,
top_k=100,
) However, the first chunk returned is not from the specified document_id 'c89d701e1a7c11f088e80242ac180006': print(chunks[0].to_json())
{'available': True,
'content': ' The authors declare that they have no competing interest, sponsorship or funding arrangements relating to their research. Furthermore, the authors.....',
'create_time': '',
'create_timestamp': 0.0,
'dataset_id': '29d75a781a7511f083190242ac180006',
'document_id': 'c96781421a7c11f088e80242ac180006', # not the same doc
'document_name': '',
'id': '3b922ab0566b4864',
'important_keywords': [],
'questions': []} It appears that the document_ids parameter is not restricting the search to only the specified documents as expected. Other documents are still being retrieved despite the filter. |
see it: #6752 |
@KevinHuSh This description is misleading and can easily be misinterpreted as allowing dataset_ids to be set to empty. |
Self Checks
Describe your problem
Hello, thank you for your attention. I have the following problem, I want to retrieve a document with a specific id, not the entire database, I try to implement this requirement with RAGFlow.retrieve, but the function reports an exception. How can I achieve this? Here is my code:
search_doc_id = ['doc_id']
response = rag_object.retrieve(question="What is attention?",dataset_ids=None,document_ids=search_doc_id)
This is RAGFlow.retrieve fuction description:
The text was updated successfully, but these errors were encountered: