Skip to content

[Bug]: HTTP API delete_chunks:Inconsistent Behavior: chunk_ids=[] Deletes All Chunks in Infinity But None in Elasticsearch #6607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
asiroliu opened this issue Mar 27, 2025 · 2 comments
Labels
🐞 bug Something isn't working, pull request that fix bug. ✅ verified

Comments

@asiroliu
Copy link
Contributor

asiroliu commented Mar 27, 2025

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

x

RAGFlow image version

166d24a3c485(infiniflow/ragflow:nightly)

Other environment information

Actual behavior

When passing an empty array chunk_ids=[] to the delete chunks API:

With DOC_ENGINE=infinity: Deletes ALL chunks under the document
With DOC_ENGINE=elasticsearch: Performs NO deletion
This inconsistency between engines may lead to accidental data loss.

Expected behavior

No response

Steps to reproduce

1. Deploy RAGFlow with DOC_ENGINE=infinity

DOC_ENGINE=infinity docker compose -f docker/docker-compose.yml up -d

2. delete chunks with chunk_ids=[]

payload = {"chunk_ids": []}
requests.delete(f'http://127.0.0.1:9380/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks', json=payload)


3. The returned response is as follows:

{'code': 102, 'message': 'rm_chunk deleted chunks 5, expect 0'}


4. Deploy RAGFlow with DOC_ENGINE=elasticsearch

docker compose -f docker/docker-compose.yml up -d

2. delete chunks with chunk_ids=[]

payload = {"chunk_ids": []}
requests.delete(f'http://127.0.0.1:9380/api/v1/datasets/{dataset_id}/documents/{document_id}/chunks', json=payload)


3. The returned response is as follows:

{'code': 0, 'message': 'delete 0 chunks'}

Additional information

No response

@asiroliu asiroliu added the 🐞 bug Something isn't working, pull request that fix bug. label Mar 27, 2025
Copy link

dosubot bot commented Mar 27, 2025

@asiroliu Hello there! I'm here to help you with any bugs, questions, or contributions you might have. Let's tackle this issue together!

The issue you're experiencing is due to inconsistent behavior in the delete_chunks API when handling an empty chunk_ids array, depending on the document engine used. For DOC_ENGINE=infinity, it deletes all chunks, while for DOC_ENGINE=elasticsearch, it performs no deletion.

This inconsistency can indeed lead to unintended data loss when using Infinity. To address this, you might consider implementing a check in the API to ensure that an empty chunk_ids array does not trigger a deletion of all chunks, regardless of the document engine. This would align the behavior across different engines and prevent accidental deletions.

Unfortunately, I couldn't find specific documentation or code comments within the repository that explain the intended behavior for this scenario. It might be beneficial to review the logic in the delete_chunks API implementation for both engines to ensure consistent handling of empty inputs.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@asiroliu
Copy link
Contributor Author

asiroliu commented Apr 1, 2025

@liwenju0
Thank you for your latest commit. Currently, both Infinity and Elasticsearch backends correctly delete all chunks when receiving an empty array. However, we've noticed a minor issue - the response returns code=102 when it should return code=0. The corresponding message should also be updated to match this successful operation status.

{"code": 102, "message": "rm_chunk deleted chunks 5, expect 0"}

KevinHuSh pushed a commit that referenced this issue Apr 2, 2025
…ssage (#6643)

…gic to return the correct deletion message. Add handling for empty
arrays to ensure no errors occur during the deletion operation. Update
the test cases to verify the new logic.

### What problem does this PR solve?

fix this bug:#6607

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: wenju.li <[email protected]>
@asiroliu asiroliu closed this as completed Apr 2, 2025
KevinHuSh pushed a commit that referenced this issue Apr 3, 2025
### What problem does this PR solve?

Update test cases for PR #6643 issue #6607

### Type of change

- [x] update test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working, pull request that fix bug. ✅ verified
Projects
None yet
Development

No branches or pull requests

1 participant