-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Fix offsets for nullable columns #3611
base: main
Are you sure you want to change the base?
[c++] Fix offsets for nullable columns #3611
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3611 +/- ##
==========================================
+ Coverage 86.26% 86.28% +0.01%
==========================================
Files 55 55
Lines 6378 6378
==========================================
+ Hits 5502 5503 +1
+ Misses 876 875 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some additional tests:
- Call Sparse/Dense NDArray
write
with an array that has a validity buffer, and ensure it acts as expected. - For DataFrame, need to verify expected behavior if an index (dimension) column is written with a validity buffer.
- In test_dataframe, I'd also test the "invalid" timestamp types (NaT) alongside the other nullable types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing requests only at this point - the C++ code looks OK (I did a quick pass only - would be good to get @XanthosXanthopoulos or @jp-dark to review it in more depth)
d89b2f4
to
4271442
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the behavior for dense arrays is correct, but otherwise it looks good to me.
70ab5db
to
e8e8405
Compare
e8e8405
to
043b36d
Compare
043b36d
to
716b705
Compare
Updated now with all requested changes except:
Because |
|
||
with soma.SparseNDArray.open(uri, "w") as A: | ||
with raises_no_typeguard(soma.SOMAError): | ||
# soma_joinid cannot be nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, soma_data also can't be nullable - can we add a branch to test for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small (optional) suggestion. Otherwise LGTM!
Issue and/or context:
[sc-62104]
We were adding the offset to the
ArrowArray
's validity buffer directly which is incorrect because the validity is stored as a bit, not as auint8_t
.Changes:
util::bitmap_to_uint8
which takes a pointer to a bitmap, length, and offset, and returns avector<uint8_t>
representation; if the bitmap is a nullptr, then return a nulloptManagedQuery::_cast_bool_data
, formallyutil::cast_bit_to_uint8
, andManagedQuery::_cast_validity_buffer
, which takes the validity buffer of the passed inArrowArray
and returnsvector<uint8_t>
with the properly shifted offsetManagedQuery::setup_write_column
andColumnBuffer::set_data
to take in anstd::optional<std::vector<uint8_t>>
where the validity buffer has already been offset and casted to uint8