Reading large parquet datasets #2385

cdbethune · 2021-03-19T01:35:05Z

For image explanations, we needed to test against big earth single when unpooled and prefeaturized, which ends up ends up being 10K rows x 32K images. The parquet loading library we use seems to be aggressively chewing through RAM, and seems to be leaking memory during the process of loading. A dataset that is about 5GB ends up occupying around 15GB of RAM after being loaded. It is also much slower than the parquet reader that is part of PANDAS, with performance being reduced linearly as it reads columns.

cdbethune added the bug label Mar 19, 2021

This was referenced Mar 19, 2021

Out of memory on model create with large datasets #2386

Open

Validate image explanations in search/classify workflow #2343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading large parquet datasets #2385

Reading large parquet datasets #2385

cdbethune commented Mar 19, 2021

Reading large parquet datasets #2385

Reading large parquet datasets #2385

Comments

cdbethune commented Mar 19, 2021