Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shrink.sh script causes build failure on VMware Fusion 7 and Yosemite #35

Open
dwtj opened this issue May 16, 2015 · 31 comments
Open

shrink.sh script causes build failure on VMware Fusion 7 and Yosemite #35

dwtj opened this issue May 16, 2015 · 31 comments

Comments

@dwtj
Copy link

dwtj commented May 16, 2015

I am trying to use to get an OS X VM Running in VMware Fusion. My test system:

  • Commit: 0cad0b9 (with minor modifications to template.json)
  • Host OS: OS X Yosemite 10.10.3
  • Guest OS: OS X Yosemite 10.10.3
  • VMware Fusion: Professional Version 7.1.1 (2498930)
  • Vagrant: 1.7.2
  • Packer: 0.7.5

At the end of packer build template.json, the vmware-iso builder fails with the following error messages:

==> vmware-iso: Provisioning with shell script: ../scripts/shrink.sh
    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.
    vmware-iso: Progress: 100 [===========>]
==> vmware-iso: Gracefully halting virtual machine...
==> vmware-iso: Failed to send shutdown command: dial tcp 172.16.96.142:22: i/o timeout
==> vmware-iso: Stopping virtual machine...
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Failed to send shutdown command: dial tcp 172.16.96.142:22: i/o timeout

==> Some builds didn't complete successfully and had errors:
--> vmware-iso: Failed to send shutdown command: dial tcp 172.16.96.142:22: i/o timeout

==> Builds finished but no artifacts were created.

While the shrink process was running, I noticed that, there was a pop-up box in the guest VM with a warning that there is no disk space left. (The build log obviously says that such warnings should be disregarded). I didn't click "OK" (when I had the chance). I think that it may have been because this pop-up was not closed that the VM could not be shutdown gracefully.

I haven't yet checked whether the user closing this pop-up in time will save the build. However, I have checked that removing shrink.sh from the list of scripts in the template.json does allow the build to run to completion. Presumably, removing just the call to vmware-tools-cli would be a sufficient workaround:

# VMware Fusion specific items
if [ -e .vmfusion_version ] || [[ "$PACKER_BUILDER_TYPE" == vmware* ]]; then
    # Shrink the disk
    sudo /Library/Application\ Support/VMware\ Tools/vmware-tools-cli disk shrink /
fi
@timsutton
Copy link
Owner

Interesting - @lstoll just added the shrink script a few days ago, and this didn't happen to me in testing the latest versions of 10.7-10.10.

I suspect it's not an issue with dismissing the GUI popup - Packer's wanting to send the command shutdown -h now over SSH, but if the system alerted you that there was zero free space it's quite possible that this caused the timeout of sending the command (dial tcp 172.16.96.142:22: i/o timeout), that the system was not responding well because of the lack of free space to allocate anything, create temp files, etc.

I think cases where GUI elements / unsaved documents block logout/shutdown is when they're part of timed logout processes initiated at a higher level. If shutdown -h now were succeeding, the system would eventually shut down. (Although with no free space it's possible it would hang in the process)

I'm not really surprised this happened, and expect it's the kind of thing that could be intermittent or more likely on different hardware just due to speed differences. Are you able to get this happening repeatedly? Wondering if @lstoll has any suggestions for how the disk shrink step could be modified?

With regards to shrinking disks, I used to see a lot of Linux Packer templates do a zeroing out of free space to aid with box compression. I wonder if that's something worth investigating in the shrink script, potentially as an alternative to asking VMware tools to shrink the disk. Currently in the shrink script we don't really deal with any non-zero spare blocks, as far as I can tell, and that could potentially give more mileage at least after the resultant image gets compressed (with potentially a larger uncompressed size however).

@mitchty
Copy link

mitchty commented May 17, 2015

Zeroing out free space helps but it is nothing as dramatic as was stated for the shrink script.

From my own use for myself using this:
https://github.com/mitchty/osx-vm-templates/blob/myconfigs/scripts/zerofreespace.sh

It reduces my builds by about a gigabyte (from about 8 gigs to 7, I include a full copy of Xcode.app so reducing size is helpful). I haven't yet merged all the upstream updates to see if I can recreate this issue and if there is a difference in size with both the shrink script and the zeroing things out yet but I could sync things and give it a go by updating the shrink script to zero space too.

@dwtj
Copy link
Author

dwtj commented May 17, 2015

I ran the packer build process again, this time from a clean git clone of osx-vm-templates (and none of my customizations to template.json). I ran into the build error again:

==> vmware-iso: Provisioning with shell script: ../scripts/shrink.sh
    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.
    vmware-iso: Progress: 100 [===========>]
==> vmware-iso: Gracefully halting virtual machine...
==> vmware-iso: Failed to send shutdown command: dial tcp 172.16.96.145:22: i/o timeout
==> vmware-iso: Stopping virtual machine...
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Failed to send shutdown command: dial tcp 172.16.96.145:22: i/o timeout

(You might notice that the log looks a little bit different this time. My understanding is that this is because the Virtual Box builder is still running. In the tests described in my original post, I had removed the Virtual Box builder, so the whole packer build was done once the vmware-iso build failed.)

I ran this test on the same system. Its specs:

  • MacBook Pro (Retina, 13-inch, Late 2013)
  • Processor: 2.4 GHz Intel Core i5
  • Memory: 8 GB 1600 MHz DDR3

dwtj pushed a commit to dwtj/osx-vmware-builder that referenced this issue May 17, 2015
- Removed unwanted builders (only `vmware-iso` is left).
- Increased disk size.
- Bumped darwin version number to match Yosemite.
- Increased VM's main memory size.
- Bumped VMware hardware compatibility version.
- Removed unwanted scripts.
- Removed `shrink.sh` script because it causes build failure. See
  [Issue timsutton#35](timsutton#35).
@mitchty
Copy link

mitchty commented May 17, 2015

So I got the same issues as dwtj when running on really slow hard drives in my mac mini. When I ran on much faster spinning rust I got the same. Which got me to this packer issue:
hashicorp/packer#1029

Which suggested two scripts fields. So I did this (the hack script is all of an echo):
https://github.com/mitchty/osx-vm-templates/blob/myconfigs/packer/mytemplate.json#L111-L115

And that worked. Versus my normal 10G image I now have a 7.1G image. This is with zeroing free space like so https://github.com/mitchty/osx-vm-templates/blob/myconfigs/scripts/shrink.sh#L12-L14 . Haven't yet looked at the size with just the vmware tool compression. But try out the two scripts fields to see if things work.

From my understanding what happens is packer thinks the machine is done executing scripts and then decides to issue a shutdown. By doing a second scripts field it will try to run that script waiting for it to timeout then issue shutdown. I think, either way it works with two scripts fields on vmware.

I generated only with packer using -only vmware-iso as well so don't think virtual box has anything to do with the issue.

@timsutton
Copy link
Owner

Nice work! I'll start testing this.

Since diskutil secureErase can accept a mount point we may be able to just give it / instead of ${slash}.

@timsutton
Copy link
Owner

Hm. If I define scripts twice in a shell provisioner step as you do here, it just jumps to the second block to execute only my no-op script (equivalent to your packerhack.sh script):

==> vmware-iso: Uploading the 'darwin' VMware Tools
==> vmware-iso: Uploading ../scripts/support/set_kcpassword.py => /private/tmp/set_kcpassword.py
==> vmware-iso: Provisioning with shell script: ../scripts/packer-issue-1029.sh
==> vmware-iso: Gracefully halting virtual machine...

With your template, it actually does both scripts fields? Or am I missing something?

@mitchty
Copy link

mitchty commented May 17, 2015

Gah you're right, didn't look through the history far enough. Will try adding it to the end after the shrink script to see if it works or not. This mini is rather slow though so it can take time.

As for / vs /dev/rdiskN I recall older versions of diskutil not able to use it. Would have to validate the assumption.

@lstoll
Copy link
Contributor

lstoll commented May 17, 2015

Sorry, I'm away from my computer at the moment. I suspect that the timeout is just the normal packer one, because the shrink does take a bunch of time and will be somewhat IO intensive.

I'll be online in a couple of hours, so I'll dig in more and provide some more background on tools shrink vs. zeroing

On May 17, 2015, at 13:04, Mitch Tishmack [email protected] wrote:

Gah you're right, didn't look through the history far enough. Will try adding it to the end after the shrink script to see if it works or not. This mini is rather slow though so it can take time.

As for / vs /dev/rdiskN I recall older versions of diskutil not able to use it. Would have to validate the assumption.


Reply to this email directly or view it on GitHub.

@mitchty
Copy link

mitchty commented May 17, 2015

Alright so putting the hack script in the same scripts block worked. mitchty@6f30749

Apologies for the escape characters, I was running under mosh (no scroll back) and was tee'ing the output to a log file:

==> vmware-iso: Provisioning with shell script: ../scripts/shrink.sh^[[0m
^[[0;32m    vmware-iso: Zeroing out free space^[[0m
^[[0;32m    vmware-iso: Creating a secondary temporary file^[[0m
^[[0;32m    vmware-iso: Mounting disk^[[0m
^[[0;32m    vmware-iso: Finished erase on disk1s2 Macintosh HD^[[0m
^[[0;32m    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.^[[0m
^[[0;32m    vmware-iso: Progress: 100 [===========>]^[[0m
^[[0;32m    vmware-iso: Disk shrinking complete.^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[0;32m    vmware-iso:^[[0m
^[[1;32m==> vmware-iso: Provisioning with shell script: ../scripts/packerhack.sh
^[[0;32m    vmware-iso: meh^[[0m
^[[1;32m==> vmware-iso: Gracefully halting virtual machine...^[[0m
^[[0;32m    vmware-iso: Waiting for VMware to clean up after itself...^[[0m
^[[1;32m==> vmware-iso: Deleting unnecessary VMware files...^[[0m

But to be honest the difference in using vmware shrink versus just zeroing out things via diskutil isn't all that huge, here is a yosemite box I built last month just using the diskutil zero free space and the box from doing both:

du -ks yosemite.box *.box                                           
10566720        yosemite.box
10350848        packer_vmware-iso_vmware.box

~100 megs isn't a huge win but thats just one data point so not sure if it holds.

@timsutton
Copy link
Owner

If it's just a simple Packer timeout issue when doing the vmware-tools-cli shrink, I'm not surprised I didn't hit it. I've only tested it on a recent iMac i7 with the PCIe SSD. Perhaps there's another template setting we can modify to increase the timeout?

@lstoll
Copy link
Contributor

lstoll commented May 17, 2015

Reading more, it doesn't seem to be the normal timeout so I was wrong there. It also shouldn't be pure computer speed. I'm on a slighter newer 13" rMBP, so I wouldn't expect anything drastic there.

I'm running up a test on a Mac mini I have to see if I can reproduce this.

As for just zeroing it out, it isn't quite the same. Using vmware tools essentially just zeroes the disk, then re-packs the vmdk images. If you just zero out the disk the compressed box will be roughly the same size, but when you uncompress it it will take up substantially more space because the vmdk images will not have been compressed. This will also slow down clone times. One workaround we could try is zeroing the disk using diskutil, then using vmware-vdiskmanager to pack down the vmdk's. Looking at this now.

@timsutton
Copy link
Owner

Agreed about the diff between vmware tools shrinking and just zeroing. I've often been frustrated by some long-running VMs I have which have slowly ballooned in size, so I should probably try something similar and see how much I can shrink them back down.

Wouldn't using vmware-vdiskmanager require you to run a plugin on the host? I thought I once ran across a plugin that runs a command on the host as a postprocessor.

@mitchty
Copy link

mitchty commented May 18, 2015

Not really, you can defragment/shrink the vmdk files in place on your hosting os if you want, from an old script I used pre-packer for building arch linux boxes:
https://github.com/mitchty/vagrant-arch-setup/blob/master/arch-build-box-vmware_fusion.sh#L38-L42

@timsutton
Copy link
Owner

Right, I should've said that more clearly. I just meant having to run something on the host as opposed to within the guest during the build process. Ideally there's nothing to have to do after Packer, especially if somoene wants to use Packer's vagrant postprocessor, since this step would need to be done after the build but before the vagrant postprocessor kicks in.

@timsutton
Copy link
Owner

Here's the plugin I was thinking of:

https://github.com/shaunduncan/packer-provisioner-host-command

@mitchty
Copy link

mitchty commented May 18, 2015

Gotcha I thought you were referring more to the long running vm's. I just use find | xargs after a vm is shutdown to defrag/shrink the vmdk's if they start getting egregious in size.

@michel117
Copy link

Is there a solution for this problem ? I ran into that all the time with my MacBook.
Cheers
Michel

@timsutton
Copy link
Owner

I still haven't ever seen Packer actually fail a build because of the VMware shrink process. Not saying it doesn't happen, just that I haven't been able to reproduce it.

@michel117
Copy link

michel117 commented Dec 4, 2015 via email

@timsutton
Copy link
Owner

But a timeout in Packer? What is taking so long that a timeout gets invoked?

@michel117
Copy link

Probably the shrinking. I removed the entry from template.json
Now I could build the box.
But in general, I wanted to say, that you did an awesome job with this
template file. I never build a osx machine that easy!
Thank you!

2015-12-04 16:11 GMT+01:00 Timothy Sutton [email protected]:

But a timeout in Packer? What is taking so long that a timeout gets
invoked?


Reply to this email directly or view it on GitHub
#35 (comment)
.


Michel Lawaty
Köpenicker Str. 47 | 10179 Berlin
mobil: 0175 114 0206
festnetz: 030 22347301

[email protected]

@chocolatewheelchair
Copy link

Host machine: OSX 10.10
Using the boxcutter templates, unadulterated.

Not sure its the exact same issue, but am consistently running into the following error for every OSX packer build:

==> vmware-iso: Provisioning with shell script: script/minimize.sh vmware-iso: ==> Turn off hibernation vmware-iso: ==> Get rid of the sleepimage vmware-iso: ==> Stop the page process and dropping swap files vmware-iso: Zeroing out free space vmware-iso: Started erase on disk0s2 Macintosh HD vmware-iso: Creating a temporary file vmware-iso: Securely erasing a file vmware-iso: Creating a secondary temporary file vmware-iso: Mounting disk vmware-iso: Finished erase on disk0s2 Macintosh HD vmware-iso: Please disregard any warnings about disk space for the duration of shrink process. vmware-iso: vmware-iso: Progress: 100 [===========>] ==> vmware-iso: Stopping virtual machine... ==> vmware-iso: Deleting output directory... Build 'vmware-iso' errored: Retryable error: Error removing temporary script at /tmp/script_7943.sh: dial tcp 172.16.250.133:22: connect: host is down

So far the only workaround I found is to remove the minimize.sh script altogether, but would rather find a better solution f anyone has suggestions?

@chrmod
Copy link

chrmod commented Jun 2, 2016

==> vmware-iso: Provisioning with shell script: ../scripts/shrink.sh
    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.
Progress: 99 [==========>]29 [===>       ]
==> vmware-iso: Stopping virtual machine...
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Retryable error: Error removing temporary script at /tmp/script_384.sh: dial tcp 192.168.109.129:22: i/o timeout

guest system: yosemite
host system: yosemite
packer: 0.9.0

The host machine is Mac mini (Late 2014), 2,6GHz i5, 8GB DDR3, 1TB HHD

@timsutton
Copy link
Owner

If you remove the shrink.sh script does the build succeed?

I'd like it if we could instead use the local shell provisioner to issue the VMDK shrink using vmware-vdiskmanager, however I think that would require sudo on the Packer build machine.

@rickard-von-essen
Copy link

Packer runs vmware-vdiskmanager twice, once with -d and once with -k and it doesn't require sudo.

@timsutton
Copy link
Owner

Oh, ok. I recalled using it to convert DMGs to VMDK, but that must be due to the elevated mechanisms required to do those particular operations.

@chrmod
Copy link

chrmod commented Jun 3, 2016

with shrink.sh removed it completes successfully

chrmod added a commit to cliqz-oss/osx-vm-templates that referenced this issue Jun 7, 2016
from unknown reason shirinking operation tend to blow up: timsutton#35
@kbotnen
Copy link

kbotnen commented Jul 28, 2016

Same problem. OSX 1011 guestos, 109 hostos.

Edited: Oh, and I use Fusion 8.5 at the moment. Not Fusion 7 as the original poster used.

I raised the disk_size to 50200 in template.json, and it crash when shrinking. If I remove shrinking part, it completed with an 21GB box (Xcode included). If I set disk_size to 40960 in template.json it works with shrinking.sh again, but the resulting box has the same size (or even a bit more), 23GB.

Is the shrinking thing working at all?

Anyway, just wanted to comment that reducing disk_size might help. And that the shrink.sh doesnt shrink my box at all.

@mafrosis
Copy link

mafrosis commented Sep 5, 2016

Bonus data point: shrinking works fine for me on 10.11 building 10.11 guest. Defaults from template, VMware only.

@kbotnen
Copy link

kbotnen commented Oct 13, 2016

10.12, building 10.12. Still not working :/

Edited: Oh, and I use Fusion 8.5 at the moment. Not Fusion 7 as the original poster used.

==> vmware-iso: Provisioning with shell script: ../scripts/chef-omnibus.sh
==> vmware-iso: Provisioning with shell script: ../scripts/puppet.sh
==> vmware-iso: Provisioning with shell script: ../scripts/add-network-interface-detection.sh
==> vmware-iso: Provisioning with shell script: ../scripts/autologin.sh
==> vmware-iso: Provisioning with shell script: ../scripts/shrink.sh
    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.
    vmware-iso: Progress: 100 [===========>]
    vmware-iso:
==> vmware-iso: Stopping virtual machine...
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Retryable error: Error removing temporary script at /tmp/script_46.sh: dial tcp 192.168.223.191:22: i/o timeout
==> Some builds didn't complete successfully and had errors:
--> vmware-iso: Retryable error: Error removing temporary script at /tmp/script_46.sh: dial tcp 192.168.223.191:22: i/o timeout
==> Builds finished but no artifacts were created.

@lox
Copy link

lox commented May 24, 2017

Im also having this issue with 10.12.3, building 10.12.3. Fusion 8.6.5:

    vmware-iso: + '/Library/Application Support/VMware Tools/vmware-tools-cli' disk shrink /
    vmware-iso: Please disregard any warnings about disk space for the duration of shrink process.
    vmware-iso: Progress: 100 [===========>]
    vmware-iso:
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Retryable error: Error removing temporary script at /tmp/script_7476.sh: dial tcp 172.16.11.218:22: connect: host is down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests