-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dims check for auto-generated extension classes #480
Comments
It looks like the datasets are not passed into |
@bendichter I think the proper way to do this will be to add in the future the ability to express additional constrains on datasets in the specification language. The behavior you describe here would add implicit behavior to the API and specification language that may or may not apply in all cases. E.g., what should we do if two datasets have the same dimension names but are in subgroups of the same neurodata_type, or what about two datasets with a dimensions I agree that being able to specify constraints on data is useful, but I think we need to make these kind of constraints explicit (at least if we are to infer these constraints directly from the specification). Users can always define additional constraints and validity checks in their custom NWBContainer API classes. |
@oruebel Thanks for weighing in on this. I'd rather not make users create programming-language-specific checks for every extension class. Then they'd have to implement this behavior in every class * every language (python + MATLAB + ??). I think this enforces good practice. You should not be naming dimensions the same thing if they mean something different. Could you elaborate on your counter-examples where this would not be the desired behavior? If you had 2D vs 3D info for electrode positions (a use-case I have encountered), first of all it would be unlikely that you'd want to specify both in a single extension. But even if you did, I don't think you would run into a problem here. You'd have: 2D: shape=(None, 2), dims=('elecs','xy') and 3D: shape=(None, 3), dims=('elecs','xyz') Here If you are really worried about it, we could just make it a warning, but my preference is to mandate that dimensions actually mean something and they should be equal. |
My main concerns are that I don't like implicit behaviors and I also don't think that this can be handled via implicit rules. Basically any time a dimension refers to something of arbitrary lengths (e.g., time, voltage, m/z etc.) two datasets may not need to have the same shape. Also, it is unclear when the implicit rule would apply. E.g., what happens if two datasets share constraints that are in different subgroups?
If we don't want to define the constraints in the API, then we need to be able to this explicitly in the spec, i.e., we'd need to change the specification language. I agree that would actually be the proper way to this. However, implementing this will require some deep changes, i.e., additional features to the specification langue, specification API, code generators, validators. etc. that this is a non-trivial project. I would suggest that we:
Since this is adding new functionality, I believe all of this can be done in a way without having to break backward compatibility. Given this and the scope of the project, I think this will likely be a project for 2.x |
OK fair enough, you might run into this issue with from pynwb import TimeSeries
TimeSeries(name='name', source='source', data=[1.,2.,.3], timestamps=[0.,1.]) I agree, if this requires changes to the schema language it's clearly a 2.x issue. |
I would like classes that are auto-generated by extensions to automatically check that dimensions with the same label across datasets have the same length. If they do not, either throw an error or a warning. For example, in the code below,
dset1
anddset2
are inDimsTest
and both have a dimension named'cells'
. I would like it to be enforced that the lengths of these datasets must be equal. The code demonstrates that this is not currently the case, since there is no warning or error when I setdset1=[0, 1, 2, 3], dset2=[0, 1, 2]
. I would prefer if this check occurred in the__init__
ofNWBDataInterface
, as opposed to on write or in a separate validator.Checklist
The text was updated successfully, but these errors were encountered: