Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support aggregate function for vector. #463

Merged
merged 6 commits into from
May 6, 2024

Conversation

my-vegetable-has-exploded
Copy link
Contributor

@my-vegetable-has-exploded my-vegetable-has-exploded commented Apr 9, 2024

close #458

This pr mainly contains function including avgsum, vector_dims, vector_norm.

vector_dims(vector)

Returns the dimensions of a given vector.

Arguments:a vector.
Return type integer representing the number of dimensions of the given vector.

vector_norms(vector)

Calculates the Euclidean norm of a given vector.
Arguments:a vector.
Return double precision representing the Euclidean norm of the given vector.

AVG

Calculates the average of the processed vectors.
Arguments
vector: vectors.
Return vector representing the average of processed vectors.

SUM

Arguments: vectors.
Return type vector representing the sum of processed vectors.

The aggregate function support need to add a state transition function sfunc and an optional final calculation function ffunc, following create aggregate document. And we add COMBINEFUNC here to support parallel aggregate.

Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: my-vegetable-has-exploded <[email protected]>
@my-vegetable-has-exploded
Copy link
Contributor Author

I will update the documentation for these functions, as well as the documentation for veci8 later.

/// accumulate intermediate state for vector average
#[pgrx::pg_extern(immutable, strict, parallel_safe)]
fn _vectors_vector_accum(
mut state: pgrx::composite_type!('static, "vectors.vector_accum_state"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schema name hardcoded now? @usamoi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@VoVAllen VoVAllen requested a review from usamoi April 9, 2024 13:50
@usamoi usamoi mentioned this pull request Apr 10, 2024
14 tasks
@usamoi
Copy link
Collaborator

usamoi commented Apr 10, 2024

Suggest add vector_l2_norm here. It helps users to take advantage of dot product as distance function.

@my-vegetable-has-exploded
Copy link
Contributor Author

Suggest add vector_l2_norm here. It helps users to take advantage of dot product as distance function.

Is this same as vector_norm ?

@usamoi
Copy link
Collaborator

usamoi commented Apr 10, 2024

Suggest add vector_l2_norm here. It helps users to take advantage of dot product as distance function.

Is this same as vector_norm ?

#255

sorry, I refer to vector_abs_norm

.unwrap()
.unwrap();
check_matched_dims(sum.len(), value.dims());
// TODO: pgrx::Array<T> don't support mutable operations currently, we can reuse the state once it's supported.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double check this solution. It seem that the memory management of AggState is well-designed. For transfn like sfunc( internal-state, next-data-values ) ---> next-internal-state , return internal-state after modification as result is just okay.
See https://github.com/postgres/postgres/blob/52b49b796cc7fd976f4da6aa49c9679ecdae8bd5/src/backend/executor/nodeAgg.c#L81-L100 and https://doxygen.postgresql.org/int8_8c.html#aff6c10f1da41c3aca8253c3e1d6f1378 for more details.
I am going to change the composite_type to a struct. And modify the state in-place.

@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as draft April 10, 2024 15:23
@my-vegetable-has-exploded my-vegetable-has-exploded marked this pull request as ready for review April 11, 2024 09:21

/// accumulate intermediate state for vector average
#[pgrx::pg_extern(immutable, strict, parallel_safe)]
fn _vectors_vector_accum<'a>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meet unused-lifetimes warning here. I don't configure out the reason here. As far as I know , this warning only occurs with #[pgrx::pg_extern].
The 235th line is the same.
cc @usamoi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro expands to

pub unsafe extern "C" fn _vectors_bvecf32_send_wrapper<'a>(
    _fcinfo: ::pgrx::pg_sys::FunctionCallInfo,
) -> ::pgrx::pg_sys::Datum

The lifetime parameter is redundant here. You could use allow to suppress this warning before pgrx fixes it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discovered this problem too. After add #![allow(unused_lifetimes)] #![allow(clippy::extra_unused_lifetimes)] in file header, it works. Thanks!

Signed-off-by: my-vegetable-has-exploded <[email protected]>
@@ -0,0 +1,292 @@
#![allow(unused_lifetimes)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VoVAllen
Copy link
Member

@usamoi PTAL

use std::ptr::NonNull;

#[repr(C, align(8))]
pub struct AccumulateStateHeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be better if it's more generic so that we use reuse it for more usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only AccumulateState for vecf32 and AccumulateState for vecf16 layout can be same? Just adding a generic type seems won't be so helpful?

Signed-off-by: my-vegetable-has-exploded <[email protected]>
@VoVAllen
Copy link
Member

VoVAllen commented May 6, 2024

@usamoi @my-vegetable-has-exploded Is this ready to merge?

@my-vegetable-has-exploded
Copy link
Contributor Author

@usamoi @my-vegetable-has-exploded Is this ready to merge?

I think it's ready to go now.

@VoVAllen VoVAllen added this pull request to the merge queue May 6, 2024
Merged via the queue into tensorchord:main with commit d81473c May 6, 2024
13 checks passed
JinweiOS pushed a commit to JinweiOS/pgvecto.rs that referenced this pull request May 21, 2024
* implement vector avg.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

* Implement sum、 vector_dims()、 vector_norm()

Signed-off-by: my-vegetable-has-exploded <[email protected]>

* update AccumulateState

Signed-off-by: my-vegetable-has-exploded <[email protected]>

* fix norm.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

* add l2_norm for VectorBorrowed.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

---------

Signed-off-by: my-vegetable-has-exploded <[email protected]>
Signed-off-by: jinweios <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Support vector aggregation function
3 participants