Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for dataset factory template to replace gdeep.data.torch_datasets.TorchDataLoader #80

Closed
wants to merge 15 commits into from

Conversation

raphaelreinauer
Copy link
Collaborator

@raphaelreinauer raphaelreinauer commented May 8, 2022

Reference issues/PRs

[BUG] Naming of the module gdeep.data.torch_datasets #66.

Furthermore, it would be great to have a unified interface for loading custom datasets
and the default datasets. This would also allow for a consistent API for loading
datasets. The concrete issue I have is that I want to add my persistence dataset from
graphs using the same interface, though I would have to make a lot of changes to the
existing implementation of the TorchDataLoader class and hence making it harder to
maintain.

The current implementation does not allow to the passing of additional kwargs to the
TorchDataset class and these are passed onward to the custom Datasets. Compare
the issue that Nicolas had about varying the number of points per torus in the
torus dataset.
Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

  • Create a factory that can be used to create data loaders for MNIST and the three sphere datasets.
  • This allows for a consistent API for loading the datasets, which was previously missing.
  • Additional kwargs can be passed to the get_dataset function and these are passed onward to the custom Datasets.
  • New Dataset can be easily added to the dataset factory.
  • The eval statement in the previous version is now replaced, resulting in cleaner and checked code.
  • This approach decouples the dataset loading from dataset generation, allowing for more flexibility and easier testing.
  • The previous version could be moved to the gdeep.data.torch_datasets module and marked deprecated.

Notice this is only a suggestion on how this streamline the dataset API!

@raphaelreinauer raphaelreinauer requested a review from matteocao May 8, 2022 15:42
@raphaelreinauer raphaelreinauer added the enhancement New feature or request label May 8, 2022
@raphaelreinauer raphaelreinauer linked an issue May 8, 2022 that may be closed by this pull request
@raphaelreinauer raphaelreinauer marked this pull request as draft May 8, 2022 19:50
@raphaelreinauer raphaelreinauer changed the title Suggestion for dataset factory template Suggestion for dataset factory template to replace gdeep.data.torch_datasets.TorchDataLoader May 8, 2022
Copy link
Contributor

@matteocao matteocao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate the factory patter proposal + the refactoring of the constants to from the preprocessing.

The factory part can be merge, the preprocessing part of the other hand needs to adapt to the new API

"""
factory = DatasetFactory()
factory.register_builder("Torchvision", TorchvisionDatasetBuilder())
factory.register_builder("Tori", TorchvisionDatasetBuilder())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is should be the ToriDatasetBuilder

import numpy as np
import torch
from transformers import BatchFeature
from transformers.feature_extraction_sequence_utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion, we have decided to change the API. Can you please remove this change from this branch and only keep the factory proposal as well as the creation of the consternates in gdeep.utility

import numpy as np
import torch
from transformers import BatchFeature
from transformers.feature_extraction_sequence_utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion, we have decided to change the API. Can you please remove this change from this branch and only keep the factory proposal as well as the creation of the consternates in gdeep.utility




class PersistenceDiagramFeatureExtractor(FeatureExtractionMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: API has changed




class PersistenceDiagramFeatureExtractor(FeatureExtractionMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: API has changed

attention_masks = features['attention_mask']
del pd_extractor
pd_extractor = PersistenceDiagramFeatureExtractor.from_pretrained('preprocessor_config.json')
# %%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion, we have decided to change the API. Can you please remove this file from this branch and only keep the factory proposal as well as the creation of the consternates in gdeep.utility

@@ -0,0 +1,17 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion, we have decided to change the API. Can you please remove this file from this branch and only keep the factory proposal as well as the creation of the consternates in gdeep.utility

@@ -0,0 +1,17 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after our discussion, we have decided to change the API. Can you please remove this file from this branch and only keep the factory proposal as well as the creation of the consternates in gdeep.utility

@@ -0,0 +1,5 @@
# %%
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is incomplete, can you please remove it?

@@ -0,0 +1,5 @@
# %%
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is incomplete, can you please remove it?

@matteocao matteocao linked an issue May 12, 2022 that may be closed by this pull request
@raphaelreinauer raphaelreinauer deleted the dataset_builder branch September 5, 2022 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants