Fix chunking bug with compound dtypes #1146

pauladkisson · 2024-11-22T18:28:13Z

Comapare Dev Tests after the fix: https://github.com/catalystneuro/neuroconv/actions/runs/12960177532/job/36153641201 (2 failed -- see #1188)

To Dev Tests before: https://github.com/catalystneuro/neuroconv/actions/runs/12995279932/job/36241639533 (8 failed)

pauladkisson · 2024-12-05T00:50:36Z

@rly, lmk what you think

pauladkisson · 2025-01-27T19:12:52Z

@h-mayorquin, this is ready for review. Basically I use the hdmf.build.builders.BaseBuilder to check if a neurodata object would have a compound dtype. Most of the complexity is introduced by the need to find a match between the neurodata object and its location in the builder, which is outlined in the docstrings. Lmk what you think!

h-mayorquin · 2025-02-06T01:32:12Z

I did a first reading. Two things:

I think, we should update hdmf on the pyproject to the latest version or greater.
Looking at the tests that were failing before they seem related to the pixel mask. Can we create a more direct test of this in the dataset configuration tests? I think it would be better to have something more unit-test-like that would fail quicker if we break this (or if we can to refactor). Can we build something simpler with pixel-mask so we don't rely on the full segmentation conversion test?

pauladkisson · 2025-02-06T02:25:52Z

I think, we should update hdmf on the pyproject to the latest version or greater.

I updated to include everything <4, which zarr-related issues. Should be able to add hdmf 4.0 soon -- see: #1191

pauladkisson · 2025-02-06T02:27:23Z

Looking at the tests that were failing before they seem related to the pixel mask. Can we create a more direct test of this in the dataset configuration tests? I think it would be better to have something more unit-test-like that would fail quicker if we break this (or if we can to refactor). Can we build something simpler with pixel-mask so we don't rely on the full segmentation conversion test?

Definitely needs some unit tests. I'll put together some.

pauladkisson · 2025-02-06T17:26:24Z

From Meeting: move has_compound_dtype inside get_data_shape

for more information, see https://pre-commit.ci

pauladkisson · 2025-02-06T21:50:21Z

From Meeting: move has_compound_dtype inside get_data_shape

Actually, get_data_shape comes from hdmf.utils, so no way to move that in this PR. Also, if I remember correctly, they use get_data_shape in ways that would make incorporating this compound_dtype fix difficult.

pauladkisson · 2025-02-06T21:50:42Z

@h-mayorquin, i added tests, so this should be good to go!

h-mayorquin

My apologies regarding the get_data_shape function.

I propose we create a new function called get_full_data_shape in src/neuroconv/tools/hdmf.py and consolidate all related functionality there. It seems that this should be something provided by HDMF, and isolating this complexity from the rest of the code would be ideal.

There is also a coupling issue with the calculation of the manager for the entire file. I would prefer not to require from_neurodata_object to accept an additional argument that is build related. However, calculating the manager for every dataset is probably too expensive, and using the manager to build only for the dataset doesn't work (based on my initial attempt).

Requests:

Centralize Logic:
Move the code to HDMF in the tools directory and create a new function, get_full_shape, that takes the builder as an optional argument so that all the logic is centralized. In the docstring, document that get_data_shape fails for compound objects and that this behavior is desired for building a dataset I/O configuration object.
Add Tests:
Add a test that asserts the correct behavior of either the new function get_dataset_full_shape or the dataset IO configuration produced by the from_neurodata_object method. See the more concrete request on the review.

.../test_backend_and_dataset_configuration/test_models/test_dataset_io_configuration_helpers.py

src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py

.../test_backend_and_dataset_configuration/test_models/test_dataset_io_configuration_helpers.py

...ls/test_backend_and_dataset_configuration/test_models/test_dataset_io_configuration_model.py

h-mayorquin

LGTM

Should we, in another PR, increase the floor of the hdmf version? Remove the ceiling?

pauladkisson · 2025-02-17T20:25:41Z

Should we, in another PR, increase the floor of the hdmf version? Remove the ceiling?

Sure, can you take care of that?

h-mayorquin · 2025-02-17T21:13:56Z

@pauladkisson
I can take care of the ceiling but I am not certain if we should increase the floor. Any idea?

pauladkisson · 2025-02-17T21:18:27Z

I can take care of the ceiling but I am not certain if we should increase the floor. Any idea?

I don't think we need to as long as it remains compatible with spikeinterface, etc.

codecov · 2025-02-17T22:12:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.72%. Comparing base (89aac67) to head (a07135c).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1146      +/-   ##
==========================================
+ Coverage   89.65%   89.72%   +0.07%     
==========================================
  Files         129      129              
  Lines        8378     8420      +42     
==========================================
+ Hits         7511     7555      +44     
+ Misses        867      865       -2

Flag	Coverage Δ
unittests	`89.72% <100.00%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/neuroconv/tools/hdmf.py	`97.65% <100.00%> (+1.02%)`	⬆️
..._helpers/_configuration_models/_base_dataset_io.py	`98.50% <100.00%> (+1.51%)`	⬆️
...roconv/tools/nwb_helpers/_dataset_configuration.py	`93.67% <100.00%> (+0.16%)`	⬆️

pauladkisson added 4 commits November 22, 2024 10:27

initial fix

9e53812

Merge branch 'main' into fix_dev_tests

3bb63c4

add inputs to dev tests workflow dispatch

ff479cb

removed unused get_spec fn

9fb42ee

pauladkisson mentioned this pull request Nov 22, 2024

Fixes for upcoming hdmf version #1147

Closed

3 tasks

pauladkisson added 4 commits December 4, 2024 14:06

implemented builder-based fix

5298401

implemented builder-based compound dtype check

6e05f7d

added docstrings

9890484

Merge branch 'main' into fix_dev_tests

73e4c39

pauladkisson marked this pull request as ready for review December 5, 2024 00:48

pauladkisson requested a review from rly December 5, 2024 00:50

pauladkisson added 8 commits December 5, 2024 14:17

added fix for top-level datasets like electrodes

330dd22

added fix for stimulus

541eafb

switched to breadth-first search

cd7670b

added support for missing top-level categories like lab_meta_data

d296d7f

added support for links

8f2d1d2

added builder to the tests

64c00ba

Merge branch 'main' into fix_dev_tests

06615cb

Merge branch 'main' into fix_dev_tests

72f49c0

pauladkisson mentioned this pull request Jan 27, 2025

updated dev tests so they can be dispatched manually #1187

Merged

pauladkisson requested a review from h-mayorquin January 27, 2025 19:09

pauladkisson added 2 commits January 27, 2025 11:19

updated changelog

8a772b4

Merge branch 'main' into fix_dev_tests

ea5f99d

pauladkisson added 2 commits February 5, 2025 18:11

updated pyproject.toml

57fdd7e

updated pyproject.toml

872ad6c

pauladkisson and others added 3 commits February 6, 2025 13:39

added tests

fd82b29

[pre-commit.ci] auto fixes from pre-commit.com hooks

f668ad3

for more information, see https://pre-commit.ci

Merge branch 'main' into fix_dev_tests

50b88a8

pauladkisson added 2 commits February 8, 2025 06:00

Merge branch 'main' into fix_dev_tests

23c0c89

Merge branch 'main' into fix_dev_tests

7c17005

h-mayorquin requested changes Feb 13, 2025

View reviewed changes

pauladkisson added 4 commits February 18, 2025 05:40

Merge branch 'main' into fix_dev_tests

6910808

optional builder

c5882be

full_data_shape in tools.hdmf

5ce4c7b

use Union instead of |

747351b

h-mayorquin reviewed Feb 17, 2025

View reviewed changes

...ls/test_backend_and_dataset_configuration/test_models/test_dataset_io_configuration_model.py Outdated Show resolved Hide resolved

removed builder from from_neurodata_object test

4dee3a2

pauladkisson enabled auto-merge (squash) February 17, 2025 19:59

removed with builder test of from_neurodata_object

a07135c

h-mayorquin approved these changes Feb 17, 2025

View reviewed changes

pauladkisson merged commit a896663 into main Feb 17, 2025
40 checks passed

pauladkisson deleted the fix_dev_tests branch February 17, 2025 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chunking bug with compound dtypes #1146

Fix chunking bug with compound dtypes #1146

pauladkisson commented Nov 22, 2024 •

edited

Loading

pauladkisson commented Dec 5, 2024

pauladkisson commented Jan 27, 2025

h-mayorquin commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

h-mayorquin left a comment

h-mayorquin left a comment •

edited

Loading

pauladkisson commented Feb 17, 2025

h-mayorquin commented Feb 17, 2025

pauladkisson commented Feb 17, 2025

codecov bot commented Feb 17, 2025

Fix chunking bug with compound dtypes #1146

Fix chunking bug with compound dtypes #1146

Conversation

pauladkisson commented Nov 22, 2024 • edited Loading

pauladkisson commented Dec 5, 2024

pauladkisson commented Jan 27, 2025

h-mayorquin commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

pauladkisson commented Feb 6, 2025

h-mayorquin left a comment

Choose a reason for hiding this comment

h-mayorquin left a comment • edited Loading

Choose a reason for hiding this comment

pauladkisson commented Feb 17, 2025

h-mayorquin commented Feb 17, 2025

pauladkisson commented Feb 17, 2025

codecov bot commented Feb 17, 2025

Codecov Report

pauladkisson commented Nov 22, 2024 •

edited

Loading

h-mayorquin left a comment •

edited

Loading