Skip to content

feat(grpc): Add gRPC vector bulk ingestion with extra_field_values#1047

Draft
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:bin-field-grpc
Draft

feat(grpc): Add gRPC vector bulk ingestion with extra_field_values#1047
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:bin-field-grpc

Conversation

@finnegancarroll
Copy link
Copy Markdown
Contributor

Description

Add support for bulk indexing vectors over gRPC using the extra_field_values optimization from OpenSearch PR #20635. Vectors are sent as packed little-endian binary (FloatBinaryLE) via the protobuf extra_field_values side-channel, bypassing JSON serialization entirely.

  • Add build_proto_vector_request() to ProtoBulkHelper for building BulkRequest with FloatBinaryLE encoding in extra_field_values map
  • Add ProtoBulkVectorsFromDataSetParamSource that reads HDF5 vectors as numpy arrays and passes them as binary (no .tolist() conversion)
  • Add ProtoBulkVectorDataSet runner and OperationType (204)
  • Bump opensearch-protobufs dependency from ==1.2.0 to >=1.4.0
  • Add 7 unit tests covering protobuf message construction and param source behavior

Issues Resolved

N/A

Testing

  • New functionality includes testing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Add support for bulk indexing vectors over gRPC using the
extra_field_values optimization from OpenSearch PR #20635. Vectors are
sent as packed little-endian binary (FloatBinaryLE) via the protobuf
extra_field_values side-channel, bypassing JSON serialization entirely.

- Add build_proto_vector_request() to ProtoBulkHelper for building
  BulkRequest with FloatBinaryLE encoding in extra_field_values map
- Add ProtoBulkVectorsFromDataSetParamSource that reads HDF5 vectors
  as numpy arrays and passes them as binary (no .tolist() conversion)
- Add ProtoBulkVectorDataSet runner and OperationType (204)
- Bump opensearch-protobufs dependency from ==1.2.0 to >=1.4.0
- Add 7 unit tests covering protobuf message construction and param
  source behavior
@github-actions
Copy link
Copy Markdown

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit a5323d8.

PathLineSeverityDescription
setup.py134highDependency version constraint for 'opensearch-protobufs' changed from a pinned version (==1.2.0) to an open-ended range (>=1.4.0). This allows any future version to be resolved at install time, which could introduce unvetted or malicious releases. Maintainers should verify the package integrity for all versions in the allowed range and consider using a more restrictive constraint (e.g., >=1.4.0,<2.0.0).

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 1 | Medium: 0 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant