Skip to content

Voodoo Bug

chazy edited this page Nov 23, 2012 · 6 revisions

This has been fixed: https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004172.html

What is the voodoo bug?

The voodoo bug in KVM/ARM or related software, which causes random kernel panics in VMs when the host machine is swapping memory.

How to introduce the voodoo bug?

Assuming you are running on a host with 2GB of physical memory, do the following:

  1. Checkout the kvm-arm-master branch from here: git://github.com/virtualopensystems/linux-kvm-arm.git
  2. Find the line in arch/arm/kvm/mmu.c that says "kvm_release_pfn_dirty(pfn);" and change it to "kvm_release_pfn_clean(pfn);"
  3. Compile this kernel and fire up your favorite KVM/ARM system using that kernel
  4. Enable swapping on the host
  5. $ swapon /dev/sdX # (or create a loopback device to swapon)
  6. Create a VM with 512 MB of RAM.
  7. Start the VM, and inside the VM:
    1. $ git clone git://github.com/chazy/mtest.git
    2. $ cd mtest
    3. $ git checkout guest-test
    4. $ make
    5. $ ./mtest 450
  8. Back in the host, do:
    1. $ mkdir /mnt/ramfs
    2. $ mount -t ramfs none /mnt/ramfs
    3. $ dd if=/dev/zero of=/mnt/ramfs/foo bs=1M count=1200
    4. $ git clone git://github.com/chazy/mtest.git
    5. $ cd mtest
    6. $ git checkout guest-test
    7. $ make
    8. $ ./mtest 600

Now wait for a little bit, and the guest kernel will go boom!

What has been tried?

We have tried all of these things with no progress:

  • Compile host and guest kernel as non-smp kernels
  • Disable kernel preemption on both host and guest
  • Disable highmem on the host
  • Set stage2 translations to make memory non-cacheable
  • Remove logic that frees stage2 page tables
  • Flush caches and TLBs on every world switch
  • Manually invalidate TLBs using an IPI instead of relying on inner-shareable invalidation
  • Calling set_page_dirty_lock instead of kvm_set_pfn_dirty for writable pages
  • Running without VGIC and arch. timers support

Other interesting observations

The bug only happens when the host begins to swap

Christoffer traced through a number of the guest kernel crashes, and one example was the assert_raw_spin_locked(&task_rq(p)->lock) failed in resched_task(p), and the call stack clearly showed that the code went through a function that locks the runqueue lock, so this would indicate that a write is somehow lost (a write that coincidentally happened from a strex), but other simple null pointer exceptions are also typically observed, for example with linked list traversals.

The bug only happens in kernel space. The mtest program runs through a loop of around 450MB and reads/writes every single word to test for consistency, and we have never seen this fail. The bug is somehow memory related, and we see the bug always in the kernel, accessing only roughly 10MB, so what does the kernel do differently than user space to provoke this bug?

Example guest kernel crashes

(These are not necessarily indicating that the bug looks different with no-preemption, it's more random than that)