Commit Graph

1190 Commits

Author SHA1 Message Date
Radim Krčmář
1b1973ef9a Merge tag 'kvm-arm-for-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for 4.10-rc4

- Fix for timer setup on VHE machines
- Drop spurious warning when the timer races against
  the vcpu running again
- Prevent a vgic deadlock when the initialization fails
2017-01-17 15:04:59 +01:00
Marc Zyngier
1193e6aeec KVM: arm/arm64: vgic: Fix deadlock on error handling
Dmitry Vyukov reported that the syzkaller fuzzer triggered a
deadlock in the vgic setup code when an error was detected, as
the cleanup code tries to take a lock that is already held by
the setup code.

The fix is to avoid retaking the lock when cleaning up, by
telling the cleanup function that we already hold it.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13 11:19:35 +00:00
Jintack Lim
488f94d721 KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
Current KVM world switch code is unintentionally setting wrong bits to
CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
timer.  Bit positions of CNTHCTL_EL2 are changing depending on
HCR_EL2.E2H bit.  EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
not set, but they are 11th and 10th bits respectively when E2H is set.

In fact, on VHE we only need to set those bits once, not for every world
switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
1, which makes those bits have no effect for the host kernel execution.
So we just set those bits once for guests, and that's it.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13 11:19:25 +00:00
Christoffer Dall
63e41226af KVM: arm/arm64: Fix occasional warning from the timer work function
When a VCPU blocks (WFI) and has programmed the vtimer, we program a
soft timer to expire in the future to wake up the vcpu thread when
appropriate.  Because such as wake up involves a vcpu kick, and the
timer expire function can get called from interrupt context, and the
kick may sleep, we have to schedule the kick in the work function.

The work function currently has a warning that gets raised if it turns
out that the timer shouldn't fire when it's run, which was added because
the idea was that in that case the work should never have been cancelled.

However, it turns out that this whole thing is racy and we can get
spurious warnings.  The problem is that we clear the armed flag in the
work function, which may run in parallel with the
kvm_timer_unschedule->timer_disarm() call.  This results in a possible
situation where the timer_disarm() call does not call
cancel_work_sync(), which effectively synchronizes the completion of the
work function with running the VCPU.  As a result, the VCPU thread
proceeds before the work function completees, causing changes to the
timer state such that kvm_timer_should_fire(vcpu) returns false in the
work function.

All we do in the work function is to kick the VCPU, and an occasional
rare extra kick never harmed anyone.  Since the race above is extremely
rare, we don't bother checking if the race happens but simply remove the
check and the clearing of the armed flag from the work function.

Reported-by: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13 11:15:30 +00:00
Wanpeng Li
4f3dbdf47e KVM: eventfd: fix NULL deref irqbypass consumer
Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12 14:42:34 +01:00
Linus Torvalds
3ddc76dfc7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
 "This series does a tree wide cleanup of types related to
  timers/timekeeping.

   - Get rid of cycles_t and use a plain u64. The type is not really
     helpful and caused more confusion than clarity

   - Get rid of the ktime union. The union has become useless as we use
     the scalar nanoseconds storage unconditionally now. The 32bit
     timespec alike storage got removed due to the Y2038 limitations
     some time ago.

     That leaves the odd union access around for no reason. Clean it up.

  Both changes have been done with coccinelle and a small amount of
  manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Get rid of ktime_equal()
  ktime: Cleanup ktime_set() usage
  ktime: Get rid of the union
  clocksource: Use a plain u64 instead of cycle_t
2016-12-25 14:30:04 -08:00
Linus Torvalds
b272f732f8 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug notifier removal from Thomas Gleixner:
 "This is the final cleanup of the hotplug notifier infrastructure. The
  series has been reintgrated in the last two days because there came a
  new driver using the old infrastructure via the SCSI tree.

  Summary:

   - convert the last leftover drivers utilizing notifiers

   - fixup for a completely broken hotplug user

   - prevent setup of already used states

   - removal of the notifiers

   - treewide cleanup of hotplug state names

   - consolidation of state space

  There is a sphinx based documentation pending, but that needs review
  from the documentation folks"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/armada-xp: Consolidate hotplug state space
  irqchip/gic: Consolidate hotplug state space
  coresight/etm3/4x: Consolidate hotplug state space
  cpu/hotplug: Cleanup state names
  cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
  staging/lustre/libcfs: Convert to hotplug state machine
  scsi/bnx2i: Convert to hotplug state machine
  scsi/bnx2fc: Convert to hotplug state machine
  cpu/hotplug: Prevent overwriting of callbacks
  x86/msr: Remove bogus cleanup from the error path
  bus: arm-ccn: Prevent hotplug callback leak
  perf/x86/intel/cstate: Prevent hotplug callback leak
  ARM/imx/mmcd: Fix broken cpu hotplug handling
  scsi: qedi: Convert to hotplug state machine
2016-12-25 14:05:56 -08:00
Thomas Gleixner
a5a1d1c291 clocksource: Use a plain u64 instead of cycle_t
There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
2016-12-25 11:04:12 +01:00
Thomas Gleixner
73c1b41e63 cpu/hotplug: Cleanup state names
When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.

Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-25 10:47:44 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Lorenzo Stoakes
8b7457ef9a mm: unexport __get_user_pages_unlocked()
Unexport the low-level __get_user_pages_unlocked() function and replaces
invocations with calls to more appropriate higher-level functions.

In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked()
with get_user_pages_unlocked() since we can now pass gup_flags.

In async_pf_execute() and process_vm_rw_single_vec() we need to pass
different tsk, mm arguments so get_user_pages_remote() is the sane
replacement in these cases (having added manual acquisition and release
of mmap_sem.)

Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
flag.  However, this flag was originally silently dropped by commit
1e9877902d ("mm/gup: Introduce get_user_pages_remote()"), so this
appears to have been unintentional and reintroducing it is therefore not
an issue.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:09 -08:00
Linus Torvalds
93173b5bf2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "Small release, the most interesting stuff is x86 nested virt
  improvements.

  x86:
   - userspace can now hide nested VMX features from guests
   - nested VMX can now run Hyper-V in a guest
   - support for AVX512_4VNNIW and AVX512_FMAPS in KVM
   - infrastructure support for virtual Intel GPUs.

  PPC:
   - support for KVM guests on POWER9
   - improved support for interrupt polling
   - optimizations and cleanups.

  s390:
   - two small optimizations, more stuff is in flight and will be in
     4.11.

  ARM:
   - support for the GICv3 ITS on 32bit platforms"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest
  KVM: arm/arm64: timer: Check for properly initialized timer on init
  KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
  KVM: x86: Handle the kthread worker using the new API
  KVM: nVMX: invvpid handling improvements
  KVM: nVMX: check host CR3 on vmentry and vmexit
  KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
  KVM: nVMX: propagate errors from prepare_vmcs02
  KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
  KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
  KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
  KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
  KVM: nVMX: support restore of VMX capability MSRs
  KVM: nVMX: generate non-true VMX MSRs based on true versions
  KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
  KVM: x86: Add kvm_skip_emulated_instruction and use it.
  KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
  KVM: VMX: Reorder some skip_emulated_instruction calls
  KVM: x86: Add a return value to kvm_emulate_cpuid
  KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
  ...
2016-12-13 15:47:02 -08:00
Linus Torvalds
edc5f445a6 Merge tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:

 - VFIO updates for v4.10 primarily include a new Mediated Device
   interface, which essentially allows software defined devices to be
   exposed to users through VFIO. The host vendor driver providing this
   virtual device polices, or mediates user access to the device.

   These devices often incorporate portions of real devices, for
   instance the primary initial users of this interface expose vGPUs
   which allow the user to map mediated devices, or mdevs, to a portion
   of a physical GPU. QEMU composes these mdevs into PCI representations
   using the existing VFIO user API. This enables both Intel KVM-GT
   support, which is also expected to arrive into Linux mainline during
   the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O
   devices (aka CCW devices) for s390 virtualization support. (Kirti
   Wankhede, Neo Jia)

 - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin)

 - Fixes to VFIO capability chain handling (Eric Auger)

 - Error handling fixes for fallout from mdev (Christophe JAILLET)

 - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song)

 - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia)

* tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
  vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages
  vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP.
  vfio iommu type1: WARN_ON if notifier block is not unregistered
  kvm: set/clear kvm to/from vfio_group when group add/delete
  vfio: support notifier chain in vfio_group
  vfio: vfio_register_notifier: classify iommu notifier
  vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'
  vfio: fix vfio_info_cap_add/shift
  vfio/pci: Drop unnecessary pcibios_err_to_errno()
  MAINTAINERS: Add entry VFIO based Mediated device drivers
  docs: Sample driver to demonstrate how to use Mediated device framework.
  docs: Sysfs ABI for mediated device framework
  docs: Add Documentation for Mediated devices
  vfio: Define device_api strings
  vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio: Introduce vfio_set_irqs_validate_and_prepare()
  vfio_pci: Update vfio_pci to use vfio_info_add_capability()
  vfio: Introduce common function to add capabilities
  vfio iommu: Add blocking notifier to notify DMA_UNMAP
  ...
2016-12-13 09:23:56 -08:00
Paolo Bonzini
f673b5b2a6 Merge tag 'kvm-arm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for 4.10:

- Support for the GICv3 ITS on 32bit platforms
- A handful of timer and GIC emulation fixes
- A PMU architecture fix
2016-12-12 07:29:39 +01:00
Ingo Molnar
6f38751510 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11 13:07:13 +01:00
Christoffer Dall
8e1a0476f8 KVM: arm/arm64: timer: Check for properly initialized timer on init
When the arch timer code fails to initialize (for example because the
memory mapped timer doesn't work, which is currently seen with the AEM
model), then KVM just continues happily with a final result that KVM
eventually does a NULL pointer dereference of the uninitialized cycle
counter.

Check directly for this in the init path and give the user a reasonable
error in this case.

Cc: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-09 15:47:00 +00:00
Andre Przywara
266068eabb KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
The GICv2 spec says in section 4.3.12 that a "CPU targets field bit that
corresponds to an unimplemented CPU interface is RAZ/WI."
Currently we allow the guest to write any value in there and it can
read that back.
Mask the written value with the proper CPU mask to be spec compliant.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-09 15:46:59 +00:00
Jike Song
2fc1bec158 kvm: set/clear kvm to/from vfio_group when group add/delete
Sometimes users need to be aware when a vfio_group attaches to a
KVM or detaches from it. KVM already calls get/put method from vfio to
manipulate the vfio_group reference, it can notify vfio_group in
a similar way.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01 10:42:17 -07:00
Dan Carpenter
a0f1d21c1c KVM: use after free in kvm_ioctl_create_device()
We should move the ops->destroy(dev) after the list_del(&dev->vm_node)
so that we don't use "dev" after freeing it.

Fixes: a28ebea2ad ("KVM: Protect device ops->create and list_add with kvm->lock")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-01 16:10:50 +01:00
Radim Krčmář
0f4828a1da Merge tag 'kvm-arm-for-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc7

- Do not call kvm_notify_acked for PPIs
2016-12-01 14:56:34 +01:00
Suraj Jitindar Singh
ec76d819d2 KVM: Export kvm module parameter variables
The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and
halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv
module and these parameters were essentially duplicated for that. There is
no benefit to this duplication and it can lead to confusion when trying to
tune halt polling.

Thus move the definition of these variables to kvm_host.h and export them.
This will allow the kvm-hv module to use the same module parameters by
accessing these variables, which will be implemented in the next patch,
meaning that they will no longer be duplicated.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-28 11:48:47 +11:00
Marc Zyngier
8ca18eec2b KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
When we inject a level triggerered interrupt (and unless it
is backed by the physical distributor - timer style), we request
a maintenance interrupt. Part of the processing for that interrupt
is to feed to the rest of KVM (and to the eventfd subsystem) the
information that the interrupt has been EOIed.

But that notification only makes sense for SPIs, and not PPIs
(such as the PMU interrupt). Skip over the notification if
the interrupt is not an SPI.

Cc: stable@vger.kernel.org # 4.7+
Fixes: 140b086dd1 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
Fixes: 59529f69f5 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-24 13:12:07 +00:00
Pan Xinhui
4ec6e86362 kvm: Introduce kvm_write_guest_offset_cached()
It allows us to update some status or field of a structure partially.

We can also save a kvm_read_guest_cached() call if we just update one
fild of the struct regardless of its current value.

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: borntraeger@de.ibm.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: jgross@suse.com
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22 12:48:07 +01:00
Paolo Bonzini
22583f0d9c KVM: async_pf: avoid recursive flushing of work items
This was reported by syzkaller:

    [ INFO: possible recursive locking detected ]
    4.9.0-rc4+ #49 Not tainted
    ---------------------------------------------
    kworker/2:1/5658 is trying to acquire lock:
     ([ 1644.769018] (&work->work)
    [<     inline     >] list_empty include/linux/compiler.h:243
    [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511

    but task is already holding lock:
     ([ 1644.769018] (&work->work)
    [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093

    stack backtrace:
    CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events async_pf_execute
     ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
     0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
     ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
    Call Trace:
    ...
    [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846
    [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
    [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
    [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
    [<     inline     >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
    [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
    [<     inline     >] kvm_destroy_vm virt/kvm/kvm_main.c:731
    [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
    [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
    [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
    [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
    [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209
    [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433

The reason is that kvm_put_kvm is causing the destruction of the VM, but
the page fault is still on the ->queue list.  The ->queue list is owned
by the VCPU, not by the work items, so we cannot just add list_del to
the work item.

Instead, use work->vcpu to note async page faults that have been resolved
and will be processed through the done list.  There is no need to flush
those.

Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:17 +01:00
Radim Krčmář
e5dbc4bf0b Merge tag 'kvm-arm-for-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc6

- Fix handling of the 32bit cycle counter
- Fix cycle counter filtering
2016-11-19 18:02:07 +01:00