-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only Partial extraction of large tarball #50
Comments
Thanks for the report. I've not had a chance to try to reproduce this yet, but just noting that the My first guess would be that PHP is hitting a memory exhaustion. How much memory is allocated / available? Are there any error messages other than the What PHP version are we talking about here? |
Was able to reproduce this with a quick test in a PHP shell (
I was left with:
...which seems like it's definitely not the full content of the archive. That's on:
...with:
On a machine with 32G of memory, FWIW. Extracting the same archive with the standard tar utility:
So yup, looks like a bug. |
I've only been able to have a quick look at this so far, but wondering if it might be some bad data inside the tarball - which presumably the standard tar utility is able to handle. The problem seems to happen when Archive_Tar tries to process (I am not an expert on this code and I may get some terminology and/or interpretation wrong..) It tries to extract the metadata from the record:
...and seems to get a bad/corrupt value for
Not surprisingly it look like things go bad from there. I'm not sure if this really is a problem in the data, or the manifestation of Archive_Tar's inability to process the data properly. I think it could be the former, but you could argue that it's both if other tools are able to successfully extract the archive. |
After doing a few experiments (e.g. re-creating the archive with tar) it looks like maybe the data is okay, it's just that Archive_Tar cannot handle the very large file within the archive.
I suspect it wasn't very common to work with files of that size in PHP when most of the code in this project was written. I'll have a closer look at what happens when Archive_Tar tries to process this within the archive, but it looks like for some reason it's not able to parse the size of the file properly. |
https://en.wikipedia.org/wiki/Tar_(computing)
Looks like that's the limitation that's being hit when Archive_Tar processes this archive. So I think this is not so much as a bug but rather a feature request to implement the extension to the tar format whereby files > 8gb are supported. At present, they are not. |
Mmh, what about this PR? #15 |
Yeah that looks like it's implementing support for the right thing, but sadly it doesn't seem to be working properly. More background info on tar formats: https://www.gnu.org/software/tar/manual/html_node/Formats.html#Formats As far as I can see, Archive_Tar does support ustar but possibly not posix (or at least the implementation doesn't work fully). |
https://mort.coffee/home/tar/ looks like a pretty good write up (I think I recognise the author :) ). I'm not sure yet whether the data that gets passed to the new \Archive_Tar::_tarRecToSize method has been incorrectly extracted from the header / archive, or whether there's a problem within the processing in the method itself. As noted earlier, the extracted FWIW the
I've run out of time to look at this for now, but will try to come back to it ASAP. |
As far as I can see, this problem goes away if we revert to using the
I'm not yet exactly sure why - it seems that https://mort.coffee/home/tar/ says:
...and that does seem to be what the I'd need to understand all the details better before deciding how to approach fixing this properly. |
I've reproduced the bug and fix/workaround (changing The change to Some of the relevant docs seems to have disappeared, but looks like we're interested in: https://bugs.php.net/bug.php?id=63738 The latter provides an example of how to maintain backwards-compatibility which closely resembles the change that was committed to Archive_Tar introducing the 184af11 .. is the change we've looked at that implemented the parsing of > 8G sizes within tar headers; that was committed late in 2015. You'd assume that last commit worked on archives containing large files at some point, in conjunction with the newer unpack format. I don't know if there have been more changes in PHP's behaviour since then which mean we're seeing this bug today, or whether there's another explanation. I'm not certain the change from Some testing on different PHP versions etc.. may be enlightening. @marcelklehr are you able to confirm whether changing the format to use |
I've submitted a PR that adds a test to check if the size of a large (> 8G) file can be read successfully. Changing the unpack format back to I'd be quite happy to drop support for anything older than PHP 5.6 personally. @ashnazg are you able to review this issue and the PR please? If you're happy with this direction, perhaps we could drop support for anything before PHP 5.6 and look at only having one format for the unpack. There's a broader question of whether we ever needed to change from |
Perhaps the change to support the larger sizes was worked on in PHP 5.5 or thereabouts, so was never affected by the |
Thank you again for digging into this!!
Yes, this resolves the issue for me! |
File is available here: download.nextcloud.com/server/apps/text2image_stablediffusion/stable-diffusion-xl-base-1.0
Result:
Plus some warnings:
Expected behavior
Should extract all folders
The text was updated successfully, but these errors were encountered: