Skip to content

Implement aggregate functions for sparse vector. #476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 4, 2024

Conversation

my-vegetable-has-exploded
Copy link
Contributor

refs #475

Purpose of this pr is almost same with #463, but mainly for sparse vector.

@VoVAllen
Copy link
Member

@usamoi PTAL

@VoVAllen
Copy link
Member

Is this ready to merge?

@my-vegetable-has-exploded
Copy link
Contributor Author

Is this ready to merge?

not yet. I am rewriting the state amd related function.

@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as draft May 17, 2024 07:46
@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as ready for review May 17, 2024 08:00
@VoVAllen
Copy link
Member

@usamoi PTAL

@VoVAllen
Copy link
Member

Ready to merge?

@my-vegetable-has-exploded
Copy link
Contributor Author

ptal @usamoi

usamoi
usamoi previously approved these changes May 20, 2024
Copy link
Collaborator

@usamoi usamoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine but too complex to review.

@VoVAllen
Copy link
Member

Since it's hard to review, can @my-vegetable-has-exploded add more randomized tests and run multiple times locally to ensure the correctness?

@VoVAllen
Copy link
Member

Both end2end and unit test

@my-vegetable-has-exploded
Copy link
Contributor Author

Since it's hard to review, can @my-vegetable-has-exploded add more randomized tests and run multiple times locally to ensure the correctness?

Okay, I'll also look into how to make the code more readable.


# test avg(svector) get the same result as avg(vector)
query ?
SELECT avg(data) = avg(data::svector)::vector FROM test_vectors;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elements from avg(data) is essentially within the range of [2,3], matching the expected value of the generated data ((100+1)/2*0.05) ~= 2.5.

Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as draft May 25, 2024 14:00
Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
#[allow(unused_unsafe)]
unsafe {
pgrx::pg_sys::submodules::panic::pgrx_extern_c_guard(move || {
let mut agg_context: *mut ::pgrx::pg_sys::MemoryContextData = std::ptr::null_mut();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pg will switch to a temporary memory context when call aggregate trans function.
https://github.com/postgres/postgres/blob/52b49b796cc7fd976f4da6aa49c9679ecdae8bd5/src/backend/executor/nodeAgg.c#L761-L801
doxygen
So we need to switch context like https://github.com/postgres/postgres/blob/7c655a04a2dc84b59ed6dce97bd38b79e734ecca/src/backend/utils/adt/numeric.c#L5635-L5648.
Currently pgrx's Aggregate trait can't pass a reference with lifetime. https://github.com/pgcentralfoundation/pgrx/blob/develop/pgrx/src/aggregate.rs. So I expand it. ( Of course, this piece of code is terrible.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel not necessary for using Aggregate trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find a good way to swtich the context under pgrx currently.😭

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just write code in C way. pgrx is more a binding than a complete interface of PostgreSQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just write code in C way. pgrx is more a binding than a complete interface of PostgreSQL.

I add a macro to wrap it currently.

Signed-off-by: my-vegetable-has-exploded <[email protected]>
@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as ready for review May 31, 2024 12:30
@my-vegetable-has-exploded
Copy link
Contributor Author

ptal @usamoi

Signed-off-by: my-vegetable-has-exploded <[email protected]>
Comment on lines +188 to +192
new_state.merge_in_place(SVecf32Borrowed::new(
dims as u32,
state.indexes(),
state.values(),
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state may not be a SVecf32 because of zeros. merge_in_place should not accept SVecf32Borrowed.

Copy link
Contributor Author

@my-vegetable-has-exploded my-vegetable-has-exploded Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think zero will be filter in Line115.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what filter_zero is used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for avg aggregate. Elements in state may be get zero after division by count.

fn _vectors_svecf32_aggregate_sum_finalfunc(state: Internal) -> Option<SVecf32Output> {
match unsafe { state.get_mut::<SVecf32AggregateAvgSumStype>() } {
Some(state) => {
state.filter_zero();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it can be safely removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be reasonable.

@usamoi usamoi added this pull request to the merge queue Jun 4, 2024
Merged via the queue into tensorchord:main with commit d7d8d0b Jun 4, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants