Commit Graph

1162 Commits

Author SHA1 Message Date
Paolo Bonzini
22583f0d9c KVM: async_pf: avoid recursive flushing of work items
This was reported by syzkaller:

    [ INFO: possible recursive locking detected ]
    4.9.0-rc4+ #49 Not tainted
    ---------------------------------------------
    kworker/2:1/5658 is trying to acquire lock:
     ([ 1644.769018] (&work->work)
    [<     inline     >] list_empty include/linux/compiler.h:243
    [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511

    but task is already holding lock:
     ([ 1644.769018] (&work->work)
    [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093

    stack backtrace:
    CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events async_pf_execute
     ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
     0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
     ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
    Call Trace:
    ...
    [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846
    [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
    [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
    [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
    [<     inline     >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
    [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
    [<     inline     >] kvm_destroy_vm virt/kvm/kvm_main.c:731
    [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
    [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
    [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
    [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
    [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209
    [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433

The reason is that kvm_put_kvm is causing the destruction of the VM, but
the page fault is still on the ->queue list.  The ->queue list is owned
by the VCPU, not by the work items, so we cannot just add list_del to
the work item.

Instead, use work->vcpu to note async page faults that have been resolved
and will be processed through the done list.  There is no need to flush
those.

Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:17 +01:00
Radim Krčmář
e5dbc4bf0b Merge tag 'kvm-arm-for-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc6

- Fix handling of the 32bit cycle counter
- Fix cycle counter filtering
2016-11-19 18:02:07 +01:00
Wei Huang
b112c84a6f KVM: arm64: Fix the issues when guest PMCCFILTR is configured
KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

To support this patch and to prevent duplicated definitions, a limited
set of ARMv8 perf event types were relocated from perf_event.c to
asm/perf_event.h.

Cc: stable@vger.kernel.org # 4.6+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-18 09:06:58 +00:00
Paolo Bonzini
05d36a7dff Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for v4.9-rc4

- Kick the vcpu when a pending interrupt becomes pending again
- Prevent access to invalid interrupt registers
- Invalid TLBs when two vcpus from the same VM share a CPU
2016-11-11 11:13:36 +01:00
Linus Torvalds
66cecb6789 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "One NULL pointer dereference, and two fixes for regressions introduced
  during the merge window.

  The rest are fixes for MIPS, s390 and nested VMX"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: Check memopp before dereference (CVE-2016-8630)
  kvm: nVMX: VMCLEAR an active shadow VMCS after last use
  KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
  KVM: x86: fix wbinvd_dirty_mask use-after-free
  kvm/x86: Show WRMSR data is in hex
  kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types
  KVM: document lock orders
  KVM: fix OOPS on flush_work
  KVM: s390: Fix STHYI buffer alignment for diag224
  KVM: MIPS: Precalculate MMIO load resume PC
  KVM: MIPS: Make ERET handle ERL before EXL
  KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04 13:08:05 -07:00
Shih-Wei Li
d42c79701a KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQs
In cases like IPI, we could be queueing an interrupt for a VCPU
that is already running and is not about to exit, because the
VCPU has entered the VM with the interrupt pending and would
not trap on EOI'ing that interrupt. This could result to delays
in interrupt deliveries or even loss of interrupts.
To guarantee prompt interrupt injection, here we have to try to
kick the VCPU.

Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04 17:56:56 +00:00
Andre Przywara
112b0b8f8f KVM: arm/arm64: vgic: Prevent access to invalid SPIs
In our VGIC implementation we limit the number of SPIs to a number
that the userland application told us. Accordingly we limit the
allocation of memory for virtual IRQs to that number.
However in our MMIO dispatcher we didn't check if we ever access an
IRQ beyond that limit, leading to out-of-bound accesses.
Add a test against the number of allocated SPIs in check_region().
Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which
is not implemented on ARM(32).

[maz: cleaned-up original patch]

Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04 17:56:54 +00:00
Paolo Bonzini
36343f6ea7 KVM: fix OOPS on flush_work
The conversion done by commit 3706feacd0 ("KVM: Remove deprecated
create_singlethread_workqueue") is broken.  It flushes a single work
item &irqfd->shutdown instead of all of them, and even worse if there
is no irqfd on the list then you get a NULL pointer dereference.
Revert the virt/kvm/eventfd.c part of that patch; to avoid the
deprecated function, just allocate our own workqueue---it does
not even have to be unbound---with alloc_workqueue.

Fixes: 3706feacd0
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-26 14:06:51 +02:00
Lorenzo Stoakes
0d73175982 mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.

Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.

We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:

  get_user_page_nowait():
    get_user_pages(start, 1, flags, page, NULL)
    __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, start, 1,
		     flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

  check_user_page_hwpoison():
    get_user_pages(addr, 1, flags, NULL, NULL)
    __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
		     NULL, NULL)

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 19:13:20 -07:00
Lorenzo Stoakes
d4944b0ece mm: remove write/force parameters from __get_user_pages_unlocked()
This removes the redundant 'write' and 'force' parameters from
__get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18 14:13:37 -07:00
Radim Krčmář
45ca877ad0 Merge tag 'kvm-arm-for-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next
KVM/ARM Changes for v4.9

 - Various cleanups and removal of redundant code
 - Two important fixes for not using an in-kernel irqchip
 - A bit of optimizations
 - Handle SError exceptions and present them to guests if appropriate
 - Proxying of GICV access at EL2 if guest mappings are unsafe
 - GICv3 on AArch32 on ARMv8
 - Preparations for GICv3 save/restore, including ABI docs
2016-09-29 16:01:51 +02:00
Christoffer Dall
0099b7701f KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
If the vgic hasn't been created and initialized, we shouldn't attempt to
look at its data structures or flush/sync anything to the GIC hardware.

This fixes an issue reported by Alexander Graf when using a userspace
irqchip.

Fixes: 0919e84c0f ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Cc: stable@vger.kernel.org
Reported-by: Alexander Graf <agraf@suse.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-27 18:57:35 +02:00
Christoffer Dall
6fe407f2d1 KVM: arm64: Require in-kernel irqchip for PMU support
If userspace creates a PMU for the VCPU, but doesn't create an in-kernel
irqchip, then we end up in a nasty path where we try to take an
uninitialized spinlock, which can lead to all sorts of breakages.

Luckily, QEMU always creates the VGIC before the PMU, so we can
establish this as ABI and check for the VGIC in the PMU init stage.
This can be relaxed at a later time if we want to support PMU with a
userspace irqchip.

Cc: stable@vger.kernel.org
Cc: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-27 18:57:07 +02:00
Vladimir Murzin
acda5430be ARM: KVM: Support vgic-v3
This patch allows to build and use vgic-v3 in 32-bit mode.

Unfortunately, it can not be split in several steps without extra
stubs to keep patches independent and bisectable.  For instance,
virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
to be already defined.

It is how support has been done:

* handle SGI requests from the guest

* report configured SRE on access to GICv3 cpu interface from the guest

* required vgic-v3 macros are provided via uapi.h

* static keys are used to select GIC backend

* to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
  the static inlines

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:22:21 +02:00
Vladimir Murzin
d7d0a11e44 KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systems
We have couple of 64-bit registers defined in GICv3 architecture, so
unsigned long accesses to these registers will only access a single
32-bit part of that regitser. On the other hand these registers can't
be accessed as 64-bit with a single instruction like ldrd/strd or
ldmia/stmia if we run a 32-bit host because KVM does not support
access to MMIO space done by these instructions.

It means that a 32-bit guest accesses these registers in 32-bit
chunks, so the only thing we need to do is to ensure that
extract_bytes() always takes 64-bit data.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:21:59 +02:00
Vladimir Murzin
e533a37f7b KVM: arm: vgic: Fix compiler warnings when built for 32-bit
Well, this patch is looking ahead of time, but we'll get following
compiler warnings as soon as we introduce vgic-v3 to 32-bit world

  CC      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o
arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer':
arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow]
  value = (mpidr & GENMASK(23, 0)) << 32;
                                   ^
In file included from ./include/linux/kernel.h:10:0,
                 from ./include/asm-generic/bug.h:13,
                 from ./arch/arm/include/asm/bug.h:59,
                 from ./include/linux/bug.h:4,
                 from ./include/linux/io.h:23,
                 from ./arch/arm/include/asm/arch_gicv3.h:23,
                 from ./include/linux/irqchip/arm-gic-v3.h:411,
                 from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14:
arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi':
./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow]
 #define BIT(nr)   (1UL << (nr))
                        ^
arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT'
  broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
                    ^
Let's fix them now.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:21:48 +02:00
Vladimir Murzin
7a1ff70828 KVM: arm64: vgic-its: Introduce config option to guard ITS specific code
By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was
introduced to hide everything specific to vgic-v3 from 32-bit world.
We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3
will gone, but we don't have support for ITS there yet and we need to
continue keeping ITS away.
Introduce the new config option to prevent ITS code being build in
32-bit mode when support for vgic-v3 is done.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:21:47 +02:00
Vladimir Murzin
19f0ece439 arm64: KVM: Move vgic-v3 save/restore to virt/kvm/arm/hyp
So we can reuse the code under arch/arm

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:21:46 +02:00
Vladimir Murzin
5a7a8426b2 arm64: KVM: Use static keys for selecting the GIC backend
Currently GIC backend is selected via alternative framework and this
is fine. We are going to introduce vgic-v3 to 32-bit world and there
we don't have patching framework in hand, so we can either check
support for GICv3 every time we need to choose which backend to use or
try to optimise it by using static keys. The later looks quite
promising because we can share logic involved in selecting GIC backend
between architectures if both uses static keys.

This patch moves arm64 from alternative to static keys framework for
selecting GIC backend. For that we embed static key into vgic_global
and enable the key during vgic initialisation based on what has
already been exposed by the host GIC driver.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:21:35 +02:00
Luiz Capitulino
45b5939e50 kvm: create per-vcpu dirs in debugfs
This commit adds the ability for archs to export
per-vcpu information via a new per-vcpu dir in
the VM's debugfs directory.

If kvm_arch_has_vcpu_debugfs() returns true, then KVM
will create a vcpu dir for each vCPU in the VM's
debugfs directory. Then kvm_arch_create_vcpu_debugfs()
is responsible for populating each vcpu directory
with arch specific entries.

The per-vcpu path in debugfs will look like:

/sys/kernel/debug/kvm/29162-10/vcpu0
/sys/kernel/debug/kvm/29162-10/vcpu1

This is all arch specific for now because the only
user of this interface (x86) wants to export x86-specific
per-vcpu information to user-space.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16 16:57:47 +02:00
Luiz Capitulino
9d5a1dcebf kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointer
This make it possible to call kvm_destroy_vm_debugfs() from
kvm_create_vm_debugfs() in error conditions.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16 16:57:46 +02:00
Paolo Bonzini
ad53e35ae5 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
Paul Mackerras writes:

    The highlights are:

    * Reduced latency for interrupts from PCI pass-through devices, from
      Suresh Warrier and me.
    * Halt-polling implementation from Suraj Jitindar Singh.
    * 64-bit VCPU statistics, also from Suraj.
    * Various other minor fixes and improvements.
2016-09-13 15:20:55 +02:00
Paolo Bonzini
5d947a1447 KVM: ARM: cleanup kvm_timer_hyp_init
Remove two unnecessary labels now that kvm_timer_hyp_init is not
creating its own workqueue anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:54:00 +02:00
Marc Zyngier
3272f0d08e arm64: KVM: Inject a vSerror if detecting a bad GICV access at EL2
If, when proxying a GICV access at EL2, we detect that the guest is
doing something silly, report an EL1 SError instead ofgnoring the
access.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
a07d3b07a8 arm64: KVM: vgic-v2: Enable GICV access from HYP if access from guest is unsafe
So far, we've been disabling KVM on systems where the GICV region couldn't
be safely given to a guest. Now that we're able to handle this access
safely by emulating it in HYP, we can enable this feature when we detect
an unsafe configuration.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00