-
Notifications
You must be signed in to change notification settings - Fork 92
Mf/presigned-urls #2063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Mf/presigned-urls #2063
Conversation
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .await | ||
| .map_err(|err| anyhow!("Error preparing file: {err}"))?; | ||
| file.set_len(check_file_size) | ||
| .await | ||
| .map_err(|err| anyhow!("Error setting file size: {err}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid treating failed range downloads as complete
The new range path pre-sizes the target file with set_len before any chunks are fetched (hf_transfer.rs lines 189–193). If download_async later gives up after retries, the partially written file is left on disk with the expected length. download_file_with_cache (download.rs lines 32–38) skips downloads solely by comparing metadata size, so the next run will see the zero/partial file as “complete” and never retry, leading to corrupted model artifacts. Consider deferring pre-allocation or deleting the file on failure so size-based skips don’t mask failed downloads.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors cloud storage downloads to use pre-signed URLs instead of direct object store operations. The main goal is to leverage the existing fast HTTP range download mechanism (originally designed for HuggingFace) for all cloud storage providers (AWS S3, Azure Blob Storage, and GCS) by generating pre-signed URLs.
Key changes:
- Removed the
cloud_range_download.rsmodule and its specialized range download logic - Refactored storage client functions to return concrete types (AmazonS3, MicrosoftAzure, GoogleCloudStorage) instead of trait objects to enable access to the
Signertrait for pre-signed URL generation - Enhanced
hf_transfer.rswith configurable chunk sizes, vectorized disk writes with buffering, and improved error handling for range downloads
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| truss-transfer/src/lib.rs | Removed cloud_range_download module declaration |
| truss-transfer/src/cloud_range_download.rs | Deleted entire file containing cloud-specific range download implementation |
| truss-transfer/src/hf_transfer.rs | Added configurable chunk sizes for different hosts, vectorized disk writes with fixed-size buffer, pre-file sizing, and improved size validation |
| truss-transfer/src/download_core.rs | Replaced direct object store downloads with pre-signed URL generation and HTTP range requests; simplified download functions |
| truss-transfer/src/download.rs | Added num_workers and semaphore_range_dw parameters to cloud download function calls |
| truss-transfer/src/create/providers/azure.rs | Changed return type boxing to support concrete MicrosoftAzure type |
| truss-transfer/src/create/providers/aws.rs | Changed return type boxing to support concrete AmazonS3 type |
| truss-transfer/src/create/azure_metadata.rs | Modified azure_storage to return concrete MicrosoftAzure type instead of boxed trait object |
| truss-transfer/src/create/aws_metadata.rs | Modified s3_storage to return concrete AmazonS3 type instead of boxed trait object |
| truss-transfer/tests/test_create_bptr_with_aws.py | Added main block for direct test execution |
| truss-transfer/Cargo.toml | Added http 1.4.0 dependency for Method enum in pre-signed URL generation |
| truss-transfer/Cargo.lock | Updated http from 1.2.0 to 1.4.0 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
🚀 What
💻 How
🔬 Testing