Skip to content

Conversation

@michaelfeil
Copy link
Contributor

🚀 What

💻 How

🔬 Testing

@michaelfeil
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +189 to +193
.await
.map_err(|err| anyhow!("Error preparing file: {err}"))?;
file.set_len(check_file_size)
.await
.map_err(|err| anyhow!("Error setting file size: {err}"))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid treating failed range downloads as complete

The new range path pre-sizes the target file with set_len before any chunks are fetched (hf_transfer.rs lines 189–193). If download_async later gives up after retries, the partially written file is left on disk with the expected length. download_file_with_cache (download.rs lines 32–38) skips downloads solely by comparing metadata size, so the next run will see the zero/partial file as “complete” and never retry, leading to corrupted model artifacts. Consider deferring pre-allocation or deleting the file on failure so size-based skips don’t mask failed downloads.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid comment

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors cloud storage downloads to use pre-signed URLs instead of direct object store operations. The main goal is to leverage the existing fast HTTP range download mechanism (originally designed for HuggingFace) for all cloud storage providers (AWS S3, Azure Blob Storage, and GCS) by generating pre-signed URLs.

Key changes:

  • Removed the cloud_range_download.rs module and its specialized range download logic
  • Refactored storage client functions to return concrete types (AmazonS3, MicrosoftAzure, GoogleCloudStorage) instead of trait objects to enable access to the Signer trait for pre-signed URL generation
  • Enhanced hf_transfer.rs with configurable chunk sizes, vectorized disk writes with buffering, and improved error handling for range downloads

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
truss-transfer/src/lib.rs Removed cloud_range_download module declaration
truss-transfer/src/cloud_range_download.rs Deleted entire file containing cloud-specific range download implementation
truss-transfer/src/hf_transfer.rs Added configurable chunk sizes for different hosts, vectorized disk writes with fixed-size buffer, pre-file sizing, and improved size validation
truss-transfer/src/download_core.rs Replaced direct object store downloads with pre-signed URL generation and HTTP range requests; simplified download functions
truss-transfer/src/download.rs Added num_workers and semaphore_range_dw parameters to cloud download function calls
truss-transfer/src/create/providers/azure.rs Changed return type boxing to support concrete MicrosoftAzure type
truss-transfer/src/create/providers/aws.rs Changed return type boxing to support concrete AmazonS3 type
truss-transfer/src/create/azure_metadata.rs Modified azure_storage to return concrete MicrosoftAzure type instead of boxed trait object
truss-transfer/src/create/aws_metadata.rs Modified s3_storage to return concrete AmazonS3 type instead of boxed trait object
truss-transfer/tests/test_create_bptr_with_aws.py Added main block for direct test execution
truss-transfer/Cargo.toml Added http 1.4.0 dependency for Method enum in pre-signed URL generation
truss-transfer/Cargo.lock Updated http from 1.2.0 to 1.4.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@michaelfeil
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants