You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with data storage version Legacy and stable row ID enabled, when try scanning the dataset with filter_expr, the same batches are returned repeatedly
reproduce steps:
#[tokio::test]
async fn test_stable_row_id() {
let test_ds = TestVectorDataset::new(LanceFileVersion::Legacy, true)
.await
.unwrap();
let dataset = &test_ds.dataset;
let mut data = dataset
.scan()
.batch_readahead(get_num_compute_intensive_cpus())
.project(&["vec"])
.unwrap()
.with_row_id()
.filter_expr(datafusion_expr::col("vec").is_not_null())
.try_into_stream()
.await
.unwrap();
let mut row_id_set = BTreeSet::new();
while let Some(batch) = data.try_next().await.unwrap() {
let row_ids = batch[ROW_ID].as_primitive::<UInt64Type>();
row_ids.values().iter().for_each(|v| {
assert!(row_id_set.insert(*v), "dup row id: {}", v);
});
}
}
The text was updated successfully, but these errors were encountered:
BubbleCal
changed the title
bug: dup batches are returned if set filter_expr on legacy dataset with stable row ID enabled
bug: dup batch are returned repeatedly if set filter_expr on legacy dataset with stable row ID enabled
Mar 5, 2025
BubbleCal
changed the title
bug: dup batch are returned repeatedly if set filter_expr on legacy dataset with stable row ID enabled
bug: dup batch is returned repeatedly if set filter_expr on legacy dataset with stable row ID enabled
Mar 5, 2025
with data storage version
Legacy
and stable row ID enabled, when try scanning the dataset withfilter_expr
, the same batches are returned repeatedlyreproduce steps:
The text was updated successfully, but these errors were encountered: