Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting Content-Type when uploading a Zarr entry #2171

Closed
wants to merge 3 commits into from

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Feb 5, 2025

Closes #1870.

@dandi/dandiarchive Should I add tests for this? If so, how?

@jwodder jwodder added the zarr Issues with Zarr hosting/processing/etc. label Feb 5, 2025
self.storage.generate_presigned_put_object_url(self.s3_path(o['path']), o['base64md5'])
self.storage.generate_presigned_put_object_url(
self.s3_path(o['path']), o['base64md5'], content_type=o['content_type']
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while at it, should may be path_md5s to be renamed (it is internal, should be ok AFAIK) into e.g. path_objs or path_recs to signal that it is not just a list (or mapping) of paths to md5s?

@yarikoptic
Copy link
Member

@dandi/dandiarchive Should I add tests for this? If so, how?

now that you added to minio interface, I think test should be adjusted and/or added to verify that functioning to https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/zarr/tests/test_zarr_upload.py

@jwodder
Copy link
Member Author

jwodder commented Feb 7, 2025

@yarikoptic None of the tests in that file both upload to a Zarr and request the uploaded entry from minio, so I'm not sure exactly how to proceed.

@dandi/dandiarchive Please advise.

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. Jake will advise on how to go about testing.

@jwodder
Copy link
Member Author

jwodder commented Feb 12, 2025

I tested this in the client using this code, with Docker Compose adjusted to point to a local build of this PR. I determined that the client does need to set the "Content-Type" header when uploading to the signed URL, but I also found that sending the "Content-Type" to the archive for inclusion in the signed URL is unnecessary (for minio, at least). If we can confirm that the same thing happens on S3, that would make this PR superfluous.

@jjnesbitt
Copy link
Member

I tested this in the client using this code, with Docker Compose adjusted to point to a local build of this PR. I determined that the client does need to set the "Content-Type" header when uploading to the signed URL, but I also found that sending the "Content-Type" to the archive for inclusion in the signed URL is unnecessary (for minio, at least). If we can confirm that the same thing happens on S3, that would make this PR superfluous.

Regarding testing, Minio and S3 differ in some ways, which is part of the reason there can't really be good tests for these features. This is one of the friction points between our local and prod deployments.

Regarding the feature itself, according to my research, if Content-Type is provided as a param to the pre-signed URL generation, then the client using the pre-signed URL for upload must include that header, to "match" the pre-signed URL (as is the case with other params like ContentMD5). However, if that param is not included in the pre-signed URL params, the client may supply that header themselves, and it will be honored.

I'm unable to test this in S3 myself at the moment, but what I've found would seem to match the results of your test. So if that truly is the case, this is sort of a policy question, more than anything else. Do we want to require Content-Type to be included alongside other params like ContentMD5 at URL generation time (and therefore enforced once the URL is "wielded")? Or should we leave that control up to the client?

To me, I agree it seems entirely superfluous to include that behavior here, since in either case, the responsibility of providing the Content-Type header lies with the client. The only difference is that with this code change, the client would need to additionally supply it at pre-signed URL generation time as well.

@jwodder
Copy link
Member Author

jwodder commented Feb 13, 2025

I was able to confirm (via a test upload to staging) that setting the Content-Type in the signed URL is unnecessary on S3. Closing this PR as superfluous.

@jwodder jwodder closed this Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zarr Issues with Zarr hosting/processing/etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend Zarr entry upload procedure to support client-assigned Content-Types
4 participants