Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunking bug with compound dtypes #1146

Merged
merged 31 commits into from
Feb 17, 2025
Merged

Fix chunking bug with compound dtypes #1146

merged 31 commits into from
Feb 17, 2025

Conversation

pauladkisson
Copy link
Member

@pauladkisson pauladkisson commented Nov 22, 2024

@pauladkisson pauladkisson marked this pull request as ready for review December 5, 2024 00:48
@pauladkisson pauladkisson requested a review from rly December 5, 2024 00:50
@pauladkisson
Copy link
Member Author

@rly, lmk what you think

@pauladkisson
Copy link
Member Author

@h-mayorquin, this is ready for review. Basically I use the hdmf.build.builders.BaseBuilder to check if a neurodata object would have a compound dtype. Most of the complexity is introduced by the need to find a match between the neurodata object and its location in the builder, which is outlined in the docstrings. Lmk what you think!

@h-mayorquin
Copy link
Collaborator

I did a first reading. Two things:

  • I think, we should update hdmf on the pyproject to the latest version or greater.
  • Looking at the tests that were failing before they seem related to the pixel mask. Can we create a more direct test of this in the dataset configuration tests? I think it would be better to have something more unit-test-like that would fail quicker if we break this (or if we can to refactor). Can we build something simpler with pixel-mask so we don't rely on the full segmentation conversion test?

@pauladkisson
Copy link
Member Author

I think, we should update hdmf on the pyproject to the latest version or greater.

I updated to include everything <4, which zarr-related issues. Should be able to add hdmf 4.0 soon -- see: #1191

@pauladkisson
Copy link
Member Author

Looking at the tests that were failing before they seem related to the pixel mask. Can we create a more direct test of this in the dataset configuration tests? I think it would be better to have something more unit-test-like that would fail quicker if we break this (or if we can to refactor). Can we build something simpler with pixel-mask so we don't rely on the full segmentation conversion test?

Definitely needs some unit tests. I'll put together some.

@pauladkisson
Copy link
Member Author

From Meeting: move has_compound_dtype inside get_data_shape

@pauladkisson
Copy link
Member Author

From Meeting: move has_compound_dtype inside get_data_shape

Actually, get_data_shape comes from hdmf.utils, so no way to move that in this PR. Also, if I remember correctly, they use get_data_shape in ways that would make incorporating this compound_dtype fix difficult.

@pauladkisson
Copy link
Member Author

@h-mayorquin, i added tests, so this should be good to go!

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies regarding the get_data_shape function.

I propose we create a new function called get_full_data_shape in src/neuroconv/tools/hdmf.py and consolidate all related functionality there. It seems that this should be something provided by HDMF, and isolating this complexity from the rest of the code would be ideal.

There is also a coupling issue with the calculation of the manager for the entire file. I would prefer not to require from_neurodata_object to accept an additional argument that is build related. However, calculating the manager for every dataset is probably too expensive, and using the manager to build only for the dataset doesn't work (based on my initial attempt).

Requests:

  1. Centralize Logic:
    Move the code to HDMF in the tools directory and create a new function, get_full_shape, that takes the builder as an optional argument so that all the logic is centralized. In the docstring, document that get_data_shape fails for compound objects and that this behavior is desired for building a dataset I/O configuration object.

  2. Add Tests:
    Add a test that asserts the correct behavior of either the new function get_dataset_full_shape or the dataset IO configuration produced by the from_neurodata_object method. See the more concrete request on the review.

@pauladkisson pauladkisson enabled auto-merge (squash) February 17, 2025 19:59
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Should we, in another PR, increase the floor of the hdmf version? Remove the ceiling?

@pauladkisson
Copy link
Member Author

Should we, in another PR, increase the floor of the hdmf version? Remove the ceiling?

Sure, can you take care of that?

@h-mayorquin
Copy link
Collaborator

@pauladkisson
I can take care of the ceiling but I am not certain if we should increase the floor. Any idea?

@pauladkisson
Copy link
Member Author

I can take care of the ceiling but I am not certain if we should increase the floor. Any idea?

I don't think we need to as long as it remains compatible with spikeinterface, etc.

@pauladkisson pauladkisson merged commit a896663 into main Feb 17, 2025
40 checks passed
@pauladkisson pauladkisson deleted the fix_dev_tests branch February 17, 2025 22:12
Copy link

codecov bot commented Feb 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.72%. Comparing base (89aac67) to head (a07135c).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1146      +/-   ##
==========================================
+ Coverage   89.65%   89.72%   +0.07%     
==========================================
  Files         129      129              
  Lines        8378     8420      +42     
==========================================
+ Hits         7511     7555      +44     
+ Misses        867      865       -2     
Flag Coverage Δ
unittests 89.72% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/neuroconv/tools/hdmf.py 97.65% <100.00%> (+1.02%) ⬆️
..._helpers/_configuration_models/_base_dataset_io.py 98.50% <100.00%> (+1.51%) ⬆️
...roconv/tools/nwb_helpers/_dataset_configuration.py 93.67% <100.00%> (+0.16%) ⬆️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Recommended Chunk Shape doesn't take into account compound dtypes
2 participants