Replies: 1 comment
-
Hello @AGKhalil, Thank you very much for the kind words and your interest in the repo. You can indeed either load everything in RAM if your compute allows it or better, you can create your own from PIL import Image
from torch.utils.data import Dataset
from pythae.data.datasets import DatasetOutput
# Create your dataset
class ImageNet(Dataset):
def __init__(self, data_dir=None, transforms=None):
self.imgs_path = [os.path.join(data_dir, n) for n in os.listdir(data_dir)]
self.transforms = transforms
def __len__(self):
return len(self.imgs_path)
def __getitem__(self, idx):
img = Image.open(self.imgs_path[idx]).convert("RGB")
if self.transforms is not None:
img = self.transforms(img)
return DatasetOutput(data=img)
# define your pre-processing
img_transforms = transforms.Compose(
[transforms.Resize((128, 128)), transforms.ToTensor()]
)
# instantiate the datasets
train_dataset = ImageNet(
data_dir="/gpfsscratch/rech/wlr/uhw48em/data/imagenet/train",
transforms=img_transforms,
)
eval_dataset = ImageNet(
data_dir="/gpfsscratch/rech/wlr/uhw48em/data/imagenet/val",
transforms=img_transforms,
)
# pass them to the Trainer
trainer = BaseTrainer(
model=model,
train_dataset=train_dataset, ### here
eval_dataset=eval_dataset, ### here
training_config=training_config,
callbacks=callbacks,
) I hope this helps :) Best, Clément |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello and thank you for the great repo. In here it is mentioned:
Note The data in the
train_data.npz
andeval_data.npz
files must be loadable as follows:This works with MNIST and CIFAR-10, but CELEB-A is simply too large to load from a single
npz
file. Am I approaching this correctly?Beta Was this translation helpful? Give feedback.
All reactions