Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dims check for auto-generated extension classes #480

Open
4 tasks done
bendichter opened this issue Apr 23, 2018 · 5 comments
Open
4 tasks done

dims check for auto-generated extension classes #480

bendichter opened this issue Apr 23, 2018 · 5 comments
Assignees
Labels
category: enhancement improvements of code or code behavior category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@bendichter
Copy link
Contributor

I would like classes that are auto-generated by extensions to automatically check that dimensions with the same label across datasets have the same length. If they do not, either throw an error or a warning. For example, in the code below, dset1 and dset2 are in DimsTest and both have a dimension named 'cells'. I would like it to be enforced that the lengths of these datasets must be equal. The code demonstrates that this is not currently the case, since there is no warning or error when I set dset1=[0, 1, 2, 3], dset2=[0, 1, 2]. I would prefer if this check occurred in the __init__ of NWBDataInterface, as opposed to on write or in a separate validator.

from datetime import datetime

from pynwb.spec import (NWBGroupSpec, NWBDatasetSpec, NWBNamespaceBuilder)
from pynwb import get_class, load_namespaces, NWBHDF5IO, NWBFile


project_name = 'dims_test'
ns_path = project_name + '.namespace.yaml'
ext_source = project_name + '.extensions.yaml'
ns_builder = NWBNamespaceBuilder(project_name, project_name)

datasets = [
    NWBDatasetSpec(doc='dset1', dtype='uint32',shape=(None,),
                   name='dset1', dims=('cells',)),
    NWBDatasetSpec(doc='dset2', dtype='uint64', shape=(None,),
                   name='dset2', dims=('cells',))
]

cont_data = NWBGroupSpec(doc='data',
                         datasets=datasets,
                         neurodata_type_inc='NWBDataInterface',
                         neurodata_type_def='DimsTest')

ns_builder.add_spec(ext_source, cont_data)
ns_builder.export(ns_path)

##

load_namespaces(ns_path)
DimsTest = get_class('DimsTest', project_name)

dims_test = DimsTest(name='name', source='source', dset1=[0, 1, 2, 3], dset2=[0, 1, 2])

nwbfile = NWBFile(source='source', session_description='session_description',
                  identifier='identifier', session_start_time=datetime.now(),
                  file_create_date=datetime.now())

module = nwbfile.create_processing_module(name='name',
                                          source='source',
                                          description='description')
module.add_container(dims_test)

with NWBHDF5IO('dims_test.nwb', mode='w') as io:
    io.write(nwbfile)

Checklist

  • Have you ensured the feature or change was not already reported ?
  • Have you included a brief and descriptive title?
  • Have you included a clear description of the problem you are trying to solve?
  • Have you included a minimal code snippet that reproduces the issue you are encountering?
@bendichter bendichter added category: enhancement improvements of code or code behavior category: proposal proposed enhancements or new features labels Apr 23, 2018
@bendichter
Copy link
Contributor Author

It looks like the datasets are not passed into NWBDataInterface.__init__, so this would require a bit of a deeper change.

@oruebel
Copy link
Contributor

oruebel commented Apr 23, 2018

@bendichter I think the proper way to do this will be to add in the future the ability to express additional constrains on datasets in the specification language. The behavior you describe here would add implicit behavior to the API and specification language that may or may not apply in all cases. E.g., what should we do if two datasets have the same dimension names but are in subgroups of the same neurodata_type, or what about two datasets with a dimensions location where one is 2D the other is 3D?

I agree that being able to specify constraints on data is useful, but I think we need to make these kind of constraints explicit (at least if we are to infer these constraints directly from the specification). Users can always define additional constraints and validity checks in their custom NWBContainer API classes.

@bendichter
Copy link
Contributor Author

bendichter commented Apr 23, 2018

@oruebel Thanks for weighing in on this. I'd rather not make users create programming-language-specific checks for every extension class. Then they'd have to implement this behavior in every class * every language (python + MATLAB + ??).

I think this enforces good practice. You should not be naming dimensions the same thing if they mean something different. Could you elaborate on your counter-examples where this would not be the desired behavior? If you had 2D vs 3D info for electrode positions (a use-case I have encountered), first of all it would be unlikely that you'd want to specify both in a single extension. But even if you did, I don't think you would run into a problem here. You'd have:

2D:

shape=(None, 2), dims=('elecs','xy')

and 3D:

shape=(None, 3), dims=('elecs','xyz')

Here elecs would be enforced, and that should be the same in both cases -- the number of electrodes does not change whether they are in 2D or 3D.

If you are really worried about it, we could just make it a warning, but my preference is to mandate that dimensions actually mean something and they should be equal.

@oruebel
Copy link
Contributor

oruebel commented Apr 23, 2018

My main concerns are that I don't like implicit behaviors and I also don't think that this can be handled via implicit rules. Basically any time a dimension refers to something of arbitrary lengths (e.g., time, voltage, m/z etc.) two datasets may not need to have the same shape. Also, it is unclear when the implicit rule would apply. E.g., what happens if two datasets share constraints that are in different subgroups?

I'd rather not make users create programming-language-specific checks for every extension class.

If we don't want to define the constraints in the API, then we need to be able to this explicitly in the spec, i.e., we'd need to change the specification language. I agree that would actually be the proper way to this.

However, implementing this will require some deep changes, i.e., additional features to the specification langue, specification API, code generators, validators. etc. that this is a non-trivial project. I would suggest that we:

  1. come up with a list of specific constraints we want to express
  2. create a design for how we want to express these constraints in the specification language
  3. define how this should be handled in the API

Since this is adding new functionality, I believe all of this can be done in a way without having to break backward compatibility. Given this and the scope of the project, I think this will likely be a project for 2.x

@bendichter
Copy link
Contributor Author

bendichter commented Apr 23, 2018

OK fair enough, you might run into this issue with 'time' if you had two different TimeSeries objects with different sampling rates in the same extension class. However I'd say it's a missing feature that the number of timestamps does not need to match the first dimensions of data in 'TimeSeries' e.g.

from pynwb import TimeSeries
TimeSeries(name='name', source='source', data=[1.,2.,.3], timestamps=[0.,1.])

I agree, if this requires changes to the schema language it's clearly a 2.x issue.

@oruebel oruebel added this to the NWB 2.x milestone May 18, 2018
@rly rly self-assigned this Apr 11, 2024
@rly rly modified the milestones: NWB 2.x, Future Apr 11, 2024
@stephprince stephprince added the priority: low alternative solution already working and/or relevant to only specific user(s) label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

4 participants