-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH]: Plumb prefix path all the way to the bf writer #4743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 05-30-_enh_return_database_id_in_get_collections_call_from_sysdb
Are you sure you want to change the base?
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Propagate Prefix Path to Blockfile Writer for Per-Segment S3 Hierarchy This PR introduces explicit plumbed prefix pathing, passing tenant and database ID from the compaction orchestrator all the way through to the blockfile writer (bf writer) in the storage layer. The prefix path, formatted as tenant/{tenant}/database/{database_id}/collection/{collection_id}/segment/{segment_id}, is now constructed within segment writers and used in all blockfile writer and reader construction, affecting record, metadata, and vector/blog-indexing segments. Comprehensive refactoring was performed to enforce explicit prefix path propagation, replace default constructions, and update tests, benchmarks, and library usage accordingly. Key Changes: Affected Areas: Potential Impact: Functionality: Blockfile-level operations are now fully namespace-scoped to the correct tenant/database/collection/segment paths; improper or missing prefix propagation will now fail at construction. Performance: None anticipated for runtime paths; test/bench setup slightly more explicit, marginally more overhead in constructing options. Security: Improves logical data isolation by enforcing per-tenant/db/collection/segment scoping. Scalability: Improves maintainability and future scalability for multi-tenant deployments and S3 pathing, ensures explicit partitioning at storage level. Review Focus: Testing Needed• Run full suite of unit tests, integration tests, and proptests for blockfile, segment, and index paths. Code Quality Assessmentrust/types/src/collection.rs: DatabaseUuid added and now present in per-segment path construction. rust/blockstore/src/types/writer_options.rs: Refactored to disallow default options, forces explicit prefix. Author notes that this is intentional to avoid silent errors. rust/segment/src/blockfile_metadata.rs: Signature change and full adoption of tenant/db/coll/seg prefix propagation. No violations of explicitness; test code modernized. rust/segment/src/blockfile_record.rs: Mirror changes as above; read/write code now correctly includes contextual path. tests/benches: Updated; all calls now pass prefix_path. Benchmarks assign prefix (generic, empty unless operationalized, but passes the compiler gate). Best PracticesTest-Maintenance: Multi-Tenancy/Scoping: Api-Design: Potential Issues• Downstream/unupdated out-of-tree project code will break upon upgrade due to breaking changes in BlockfileWriterOptions construction. This summary was automatically generated by @propel-code-bot |
} | ||
|
||
impl BlockfileWriterOptions { | ||
pub fn new() -> Self { | ||
Self::default() | ||
pub fn new(prefix_path: String) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intentional to force all callers to necessarily think about and set a prefix path. There is no default constructor now.
4411701
to
f9a9e60
Compare
573315d
to
c24e5b0
Compare
c24e5b0
to
a220d89
Compare
@@ -1928,9 +1945,9 @@ mod tests { | |||
|
|||
// Test that a v1.1 writer can read a v1 blockfile and dirty a block | |||
// successfully hydrating counts for ALL blocks it needs to set counts for | |||
|
|||
let prefix_path = String::from(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - create a const to hold this
@@ -536,7 +536,9 @@ impl SpannIndexWriter { | |||
blockfile_id: &Uuid, | |||
blockfile_provider: &BlockfileProvider, | |||
) -> Result<BlockfileWriter, SpannIndexWriterError> { | |||
let mut bf_options = BlockfileWriterOptions::new(); | |||
// TODO(Sanket-temp): Change this suitably |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we want to add any basic sanity testing around the migration of the path?
- Should we align with other services prefixes?
segment_id: &SegmentUuid, | ||
) -> String { | ||
format!( | ||
"{}/{}/{}/{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is different than the prefix used for version files and lineage files ({}/databases/{}/collections/{}
), I think we should use the same prefix across everything
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None