Skip to content

how to remove the cached memory by DALI #5816

Open
@zmasih

Description

@zmasih

Describe the question.

I am working on optimizing a DALI pipeline for benchmarking data-loading performance and want to clarify DALI’s memory management and caching behavior.

Based on the documentation, I understand that:
DALI does not release memory to the system but instead reuses it via a global memory pool.
Deleting a pipeline does not free memory, but it allows a new pipeline to reuse the allocated memory.
Pipeline recreation adds significant overhead due to initialization costs.
DALI readers (e.g., fn.readers.tfrecord) may have internal caching mechanisms that persist between pipeline instances.
Is my understanding correct?

To be more clear, I have the following questions:
Does DALI reuse any previously cached dataset buffers when a new pipeline is created, or does each new pipeline force a full reload from storage?
If a new pipeline reuses allocated memory, does that mean the dataset itself might still be cached in RAM (or another internal buffer) rather than being freshly loaded?
Are random_shuffle=True and cache_header_information=False sufficient to guarantee that each batch is read fresh from disk/memory, even when using the same pipeline instance?
Would manually calling nvidia.dali.backend.ReleaseUnusedMemory() ensure fresh data loading, or does it only affect unused memory blocks?

My goal is to ensure that each iteration loads data fresh from memory (not from cached batches)
Any insights into how DALI handles this at a low level would be highly appreciated

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions