-
Notifications
You must be signed in to change notification settings - Fork 1k
Compression plugin #18314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop2
Are you sure you want to change the base?
Compression plugin #18314
Conversation
…xisting function (#18320) Changelog: omit Docs: omit As suggested in #18314 (comment) This PR will simplify #18314 by modifying the compression functionality on cache save command. Intead of creating a tar object and adding files one by one (very coupled interface with tarfile module), create a list of files which will be passed to a already existing method in conan `compress_files`. This method may be moved to another module in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better, easier to assess that the risk is low.
Just some small concern about the pkglist.json of the conan cache restore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is looking good.
Before proceeding, I'd like to see a poc of using zstd and how the possible fallbacks and error handling is done, when mixing packages conan_package.tgz
compressed with gz and with zstd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about not re-checking the file, otherwise this looks good :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces a plugin hook for compression/decompression by loading a user-supplied compression.py
and delegating to it when present, with fallback to the builtin behavior.
- Loaded
compression.py
via a newconfig.compression_plugin
property and propagated it through all compress/uncompress calls. - Updated
compress_files
anduncompress_file
to invoke the plugin if available. - Added integration tests for plugin loading, fallback behavior, and default output.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
test/integration/extensions/test_compression_plugin.py | added tests for plugin validity and end-to-end usage |
test/integration/command/cache/test_cache_save_restore.py | asserted default compress/decompress outputs when no plugin |
conan/internal/rest/rest_client_local_recipe_index.py | passed config_api into LocalRecipesIndexApp |
conan/internal/rest/remote_manager.py | propagated compression_plugin into uncompress_file |
conan/internal/cache/home_paths.py | added compression_plugin_path property |
conan/internal/api/uploader.py | forwarded compression_plugin to compress_files |
conan/internal/conan_app.py | injected config_api into RemoteManager and LocalRecipesIndexApp |
conan/api/subapi/config.py | implemented compression_plugin loader property |
conan/api/subapi/cache.py | integrated plugin hook into cache save/restore |
Comments suppressed due to low confidence (1)
conan/api/subapi/config.py:248
- The code references
os.path
but there’s noimport os
at the top of this module. Addimport os
to avoid a NameError.
if not os.path.exists(compression_plugin_path):
@@ -158,9 +158,13 @@ def test_cache_save_excluded_folders(): | |||
|
|||
# exclude source | |||
c.run("cache save * --no-source") | |||
# Check default compression function is being used and not compression.py plugin one | |||
assert "Compressing conan_cache_save.tgz\n" in c.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Instead of matching the newline, assert on "Compressing conan_cache_save.tgz" in c.out
to make the test more robust across platforms and output formats.
assert "Compressing conan_cache_save.tgz\n" in c.out | |
assert "Compressing conan_cache_save.tgz" in c.out |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to discuss the fact that any enterprise adding this plugin would not be able to use their old packaged libraries as-is.
I think the implementation should be robust enough that when the plugin is activated and the file to decompress was not compressed by the plugin, but by vanilla Conan, the workflow would then still be valid. This way we would avoid forcing users to recompile the world just to change the compression of their packages, which I think would kill the feature for any already-existing Conan user.
Note that this does NOT mean for the inverse to be true. I think it it's ok for packages that have been compressed by the plugin to only be able to be decompressed with the plugin too, anyone willing to use the plugin at-scale more than likely already has the infra setup to ensure everyone uses the plugin.
Yes, that is a great point!
Of course, if the plugin is not enabled and conan finds out while decompressing that our "constant filename" has been decompressed, it will raise an error as client wouldn't know how to decompress that file. |
6fab859
to
f986651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some minor interface/UI/UX issues, but overall this approach seems much better than the previous iteration.
conan/internal/util/files.py
Outdated
|
||
def _tar_extract_with_plugin(fileobj, destination_dir, compression_plugin, conf): | ||
"""First remove tar.gz wrapper and then call the plugin to extract""" | ||
with tempfile.TemporaryDirectory() as temp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conan doesn't really use system TemporaryDirectories for anything, it tries to do everything in the Conan cache, in cache folders, so if something goes wrong the files are there, and also there is no "leakage" of pacakges, recipes or code in anywhere else beside the Conan cache, that can be easily wiped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I didn't like the idea of messing up the conan cache. Do we have temporal folder in our conan cache?
Also, there are some other temporaries usages
conan/conan/api/subapi/cache.py
Line 170 in f986651
pkglist_path = os.path.join(tempfile.gettempdir(), "pkglist.json") |
(do not look at the blame, it was me!) but this case is different as it is only a pckglist extraction + removal.
I could move it also to the conan cache and may consider creating a tmp
folder in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand the concern.
Let me think about it.
conan/internal/util/files.py
Outdated
# Check if the tar was compressed with the compression plugin by checking the existence of | ||
# our constant COMPRESSED_PLUGIN_TAR_NAME (without extension as extension is added by the plugin) | ||
if list(Path(temp_dir).glob(f"{COMPRESSED_PLUGIN_TAR_NAME}.*")): | ||
# Get the only extracted file: the plugin tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reference to #18259, we'd eventually like to have our own additional files added to the top-level. Don't assume here that it's the only file. I don't think this requires anything other than clarifying the comment.
# Get the only extracted file: the plugin tar | |
# Extract the actual contents from the plugin tar (ignore other files present). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greetings @mrjoel,
I understand your concern.
Maybe from the perspective of the git diff it is not clear enough.
With this incoming compression plugin you could do whatever you need with the files.
You just have to implement tar_compress
and tar_extract
functions in the compression.py
file.
In that functions, you could add whatever metadata you need.
The thing is that, to avoid issues with the compressed file extension (which could be whatever the user decides), we have decided to wrap that compressed result file into a constant conan_cache_save.tgz
file name.
This wraping is being done with compress level 0 and takes a negligible overhead.
I'll put an example.
Without the plugin:
$ conan cache save "zlib/1.3.1:*"
$ ls -l
.rw-r--r--@ 1,7M perseo 11 jul 16:19 conan_cache_save.tgz
$ tar xzf conan_cache_save.tgz
drwxr-xr-x@ - perseo 11 jul 16:20 b
.rw-r--r--@ 1,7M perseo 11 jul 16:19 conan_cache_save.tgz
.rw-r--r--@ 949 perseo 11 jul 16:19 pkglist.json
drwxr-xr-x@ - perseo 11 jul 16:20 zlib204752602052d
With the plugin, using standard:
$ conan cache save "zlib/1.3.1:*"
$ ls -l
.rw-r--r--@ 1,7M perseo 11 jul 16:19 conan_cache_save.tgz
$ tar xzf conan_cache_save.tgz
.rw-r--r--@ 1,6M perseo 11 jul 16:21 __conan_plugin_compressed_contents__.zst
.rw-r--r--@ 1,6M perseo 11 jul 16:21 conan_cache_save.tgz
$
As you can see, when decompressing the conan_cache_save.tgz
file, we will always get the output file of your tar_compress
function.
And is that file the one which will be passed to the tar_extract
function on your plugin.
In that compressed file, you could have whatever structure and metadata you need. Your tar_extract
could work with that metadata as long as always decompress the files in the way conan expects.
Answering your suggestion: there will always be one file, the unwrapped compressed file from the plugin!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perseoGI Yep, I'm tracking all of that from the diff.
My main concern was just including the assumption and comment that the outer wrapper would only contain a single file. For #18259 we'd really like it to end up being the following, where we add the custom additional files. We don't expect Conan to know or do anything about it, but do want it to not cause issues.
I'm slightly sad that custom files in the direct wrapper won't be compressed at all, but so be it, we mainly expect files of a few hundred KBs (the HTML graph reports).
$ conan cache save "zlib/1.3.1:*"
$ ls -l
.rw-r--r--@ 1,7M perseo 11 jul 16:19 conan_cache_save.tgz
...
# Other things happen externally to add files directly to conan_cache_save.tgz
...
$ tar xzf conan_cache_save.tgz
.rw-r--r--@ 1,6M perseo 11 jul 16:21 __conan_plugin_compressed_contents__.zst
.rw-r--r--@ xxxx perseo 11 jul 16:21 README.txt
.rw-r--r--@ xxxx perseo 11 jul 16:21 Cache-contents-graph-report.html
.rw-r--r--@ xxxx perseo 11 jul 16:21 Some-other-externally-embedded-file.py
.rw-r--r--@ xxxx perseo 11 jul 16:21 Some-other-externally-embedded-file.exe
.rw-r--r--@ 1,6M perseo 11 jul 16:21 conan_cache_save.tgz
$
As a separate question, why gzip -0
the generated archive at all instead of just having it be a plain native .tar
file? That would certainly make it easier for us to tar -rvf conan_cache_save.tar file1.txt file2.txt file3.html
instead of having to ungzip and regzip a level 0 compression layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, this approach of requiring the compression plugin to be enabled in a user's Conan home means that it is considered for all projects they may be working. We'd like to be able to provide our conan package archives in a non-interference way, not requiring users of our archive to have to use our compression plugin for everything.
Would you consider an approach where an embedded compression.py
(or __conan_plugin_compression_extension__.py
or some variant) file in the archive level would be used if present? It could then only be used for that archive instead of needing to be installed fully in a user's Conan home?
conan/internal/util/files.py
Outdated
# The tar was not compressed using the plugin, copy files to destination | ||
from conan.tools.files import copy | ||
ConanOutput().debug(f"Extracted in {time.time() - t1} time built in") | ||
copy(None, pattern="*", src=temp_dir, dst=destination_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contents will already be unpacked here and will just be deleted on cleanup of temp_dir
. Instead of copying, would you consider moving them directly? That would seem to improve I/O required, skip the double disk space usage requirement, and likely result in much less I/O overhead from antivirus scanning on Windows.
Also, since I/O may be an appreciable contribution to timing, consider putting the time tracing after the copy/move step.
Add new test case to check those files are correctly pruned after extraction Improve performance by extracting in the actual destination folder even when plugin is enabled
Changelog: Feature: Allow externalization of compression/decompression tasks to a user defined plugin
Docs: conan-io/docs#4107
Close #18259
Close #5209
Close: #18185
Related issues:
Related PRs:
With the aim of simplicity this PR keeps the current behaviour in some compression/decompression tasks such as:
Adding an extra layer which will be in charge of determining if the
compression.py
plugin is defined falling back to existing behaviour if it doesn't.The
compression.py
plugin interface may follow the following structure:Pluging example with zstd: