-
Notifications
You must be signed in to change notification settings - Fork 391
Description
Motivation
The current implementation of MinariExperienceReplay
requires datasets to be downloaded using the class itself, which creates an env_metadata.json
file in the target directory. This workflow does not accommodate custom Minari datasets created by users or datasets that have been loaded into the local Minari cache by other means (e.g., through minari.load_dataset
or custom dataset creation via DataCollector
).
As a result, attempting to instantiate MinariExperienceReplay
with download=False
for locally available datasets leads to a FileNotFoundError
due to missing metadata files, even though the dataset exists in the Minari cache. This limitation is frustrating for users who want to leverage their own datasets without redownloading or duplicating data, and it hinders workflows where datasets are managed independently of TorchRL.
This issue is meant to enable loading datasets directly from the local Minari cache (typically ~/.minari/datasets
) without requiring prior setup via MinariExperienceReplay
's download workflow, making it more flexible and compatible with custom and preloaded datasets.
Solution
Add and fully support the argument load_from_local_minari
to the MinariExperienceReplay
class. When set to True
, this argument will instruct the class to:
- Look for the dataset in the user's local Minari cache (e.g.,
~/.minari/datasets/{dataset_id}/data/main_data.hdf5
). - Bypass any download or remote fetching logic.
- If the required files are present, load the dataset and construct any necessary metadata on-the-fly (e.g., from the Minari dataset spec, if possible).
- Raise a clear and informative
FileNotFoundError
if the dataset is not found in the expected local cache location. - Ensure that custom datasets created by users (such as those with
DataCollector(...).create_dataset(...)
) or datasets first loaded withminari.load_dataset
can be used seamlessly withMinariExperienceReplay
.
This solution allows for greater flexibility, avoids unnecessary downloads and data duplication, and makes TorchRL compatible with the wider Minari ecosystem.
Alternatives
- Manual copying of files: Users could manually copy datasets and metadata to the expected TorchRL directory, but this is error-prone and not user-friendly.
- Automated metadata generation scripts: Provide standalone tools for generating
env_metadata.json
based on existing Minari datasets. This adds maintenance burden and complexity for users.
Additional context
- The new
load_from_local_minari
argument should default toFalse
to preserve backward compatibility. - If
load_from_local_minari=True
is set, theMinariExperienceReplay
class will prioritize loading the dataset directly from the local Minari cache (typically located at~/.minari/datasets
). If the dataset exists in the cache, the class will skip any fetching from the Minari server; no remote download or overwrite will occur. After loading the dataset from the local cache, all subsequent preprocessing and loading steps will proceed as usual, ensuring the dataset is processed and made available correctly. - This feature will facilitate workflows for research, benchmarking, and development using custom or proprietary datasets, and it is more in line with how Minari itself manages datasets locally.
- Example usage:
import minari
data = MinariExperienceReplay(
dataset_id=dataset_id,
split_trajs=False,
batch_size=128,
sampler=SamplerWithoutReplacement(drop_last=True),
prefetch=4,
load_from_local_minari=True, # <--- key addition
)
Checklist
- I have checked that there is no similar issue in the repo (required)