Explore performance degradation from reading parquet files file by file

**Is your feature request related to a problem? Please describe.**
We were historically doing `pd.read_parquet(list_of_files)` but now have moved `to pd.concat((pd.read_parquet(f) for f in files))`.

https://github.com/NVIDIA-NeMo/Curator/pull/1249/files#r2546234214



**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore performance degradation from reading parquet files file by file #1255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explore performance degradation from reading parquet files file by file #1255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions