-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
GCS automatically transcodes gzipped files based on the Content-Encoding of a file which can be problematic for certain downstream consumers since it disables range requests and the Content-Length header does not match the file size expected by the client.
In particular, the Cache-Control header is very useful since it instructs GCS to bypass its automatic transcoding behavior and serve the file exactly as stored (compressed) if no-transform is passed.
This restores support for HTTP Range Requests and accurate Content-Length headers, allowing ClickHouse (and other parallel processing engines) to download and decompress the files correctly on their end.
Attempted Solutions
The current workaround is to store raw JSON without compression, but this adds up in terms of cost very quickly
Proposal
Add support for content_encoding and cache_control
References
No response
Version
No response