write a load_example method #63

alex-hh · 2024-11-10T11:13:18Z

assuming a dataset has an id field and an index.

alex-hh · 2024-11-10T11:26:55Z

index will be a parquet file with no extension mapping id to shard - then we can download a single shard and retrieve the example

alex-hh · 2024-11-10T12:04:30Z

What we need:

a split generator that looks for config+split-specific index files (train_index or train/index)
index files allow us to subset both parquets and examples
we then add a ds.filter before returning the dataset.
there might be an efficient arrow way to implement the filter

(this could also go directly into yaml but the index file solution is more modular).

alex-hh mentioned this issue Nov 14, 2024

Identifier-based split builder #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write a load_example method #63

write a load_example method #63

alex-hh commented Nov 10, 2024 •

edited

Loading

alex-hh commented Nov 10, 2024

alex-hh commented Nov 10, 2024 •

edited

Loading

write a load_example method #63

write a load_example method #63

Comments

alex-hh commented Nov 10, 2024 • edited Loading

alex-hh commented Nov 10, 2024

alex-hh commented Nov 10, 2024 • edited Loading

alex-hh commented Nov 10, 2024 •

edited

Loading

alex-hh commented Nov 10, 2024 •

edited

Loading