Generic preprocessing module #79
Labels
enhancement
New feature or request
top priority
The issue is to be solved as soon as possible, as it may block the usage of the library
Milestone
Is your feature request related to a problem? Please describe.
The pain is that, most often, plain datasets are not in the right input format or do not have the designed statistical caracterisrtics. Furthermore, standard techniques like data augmentation, need to be implemented
Describe the solution you'd like
We build an API class (
AbstractClass
) for the preprocessing -- a generic one.It should look similar to this one:
Each of the methods shall be implemented, as it will be called automatically inside the
Dataset
classes:__getitem__
will be transformed byitem_transform
. the data insideitem_transform
that are needed to perform the transformation, will be stored in self. The methodsdataset_level_data
andbatch_level_data
will be called only once, before the first time that__getitem__
is called.Describe alternatives you've considered
Only doing point 3 above (without 1 and 2), however I find it is always possible to only use that approach and it is much easier to implement and is less bind to the generic pipeline
Additional context
The text was updated successfully, but these errors were encountered: