-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Describe
When a table contains a large number of equality delete records, some data rows that should be returned by queries are missing. This appears to happen even when the rows are not logically deleted. When using other engines (e.g., Apache Spark) to read the same table, the results are correct and all expected rows are returned. This suggests that the table data and delete files are valid, and the problem is likely specific to this engine’s equality-delete implementation or planning logic.
Env
Duckdb version: 1.4.1 (also occurs in 1.4.3)
Iceberg version: 1.5.2
Spark version: 3.5.4
Steps to reproduce
- Create a table and insert a dataset of size X (such as 50k).
- Write a large number of equality delete records Y (such as 49995).
- Run a query such as:
SELECT * FROM <table>. - Observe that some rows that are not deleted are missing from the result.
Expected behavior
Queries should return all rows that are not logically deleted, regardless of the number of equality delete records.
Actual behavior
Some valid rows are filtered out when the number of equality delete records grows large.
Example (for fast reproduce)
- Download the attachment equality_test.zip
- Unzip the archive and place the content into the directory: /tmp/data/
- Execute
select * from iceberg_scan('/tmp/data/equality_test/metadata/v3.metadata.json');in Duckdb - Observe that the result is empty.

- Execute
select * from iceberg_scan('/tmp/data/equality_test/metadata/v3.metadata.json') where _id = 'id_183550';in Duckdb - Observe that the result is not empty.
