Commit Graph

847 Commits

Author SHA1 Message Date
Paolo Bonzini
2fa462f826 Merge tag 'kvm-arm-for-4.1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM changes for v4.1, take #2:

Rather small this time:

- a fix for a nasty bug with virtual IRQ injection
- a fix for irqfd
2015-04-22 17:08:12 +02:00
Andre Przywara
fd1d0ddf2a KVM: arm/arm64: check IRQ number on userland injection
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc07652e000
[00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
.....

So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-04-22 15:42:24 +01:00
Eric Auger
0b3289ebc2 KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
irqfd/arm curently does not support routing. kvm_irq_map_gsi is
supposed to return all the routing entries associated with the
provided gsi and return the number of those entries. We should
return 0 at this point.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-04-22 15:37:54 +01:00
Paul Mackerras
e23a808b16 KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vmnnnn, where nnnn is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:31 +02:00
Linus Torvalds
9003601310 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.1

  The most interesting bit here is irqfd/ioeventfd support for ARM and
  ARM64.

  Summary:

  ARM/ARM64:
     fixes for live migration, irqfd and ioeventfd support (enabling
     vhost, too), page aging

  s390:
     interrupt handling rework, allowing to inject all local interrupts
     via new ioctl and to get/set the full local irq state for migration
     and introspection.  New ioctls to access memory by virtual address,
     and to get/set the guest storage keys.  SIMD support.

  MIPS:
     FPU and MIPS SIMD Architecture (MSA) support.  Includes some
     patches from Ralf Baechle's MIPS tree.

  x86:
     bugfixes (notably for pvclock, the others are small) and cleanups.
     Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  KVM: use slowpath for cross page cached accesses
  kvm: mmu: lazy collapse small sptes into large sptes
  KVM: x86: Clear CR2 on VCPU reset
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: simplify kvm_apic_map
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: use MDA for interrupt matching
  kvm/ppc/mpic: drop unused IRQ_testbit
  KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
  KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
  KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
  KVM: vmx: pass error code with internal error #2
  x86: vdso: fix pvclock races with task migration
  KVM: remove kvm_read_hva and kvm_read_hva_atomic
  KVM: x86: optimize delivery of TSC deadline timer interrupt
  KVM: x86: extract blocking logic from __vcpu_run
  kvm: x86: fix x86 eflags fixed bit
  KVM: s390: migrate vcpu interrupt state
  ...
2015-04-13 09:47:01 -07:00
Radim Krčmář
ca3f087472 KVM: use slowpath for cross page cached accesses
kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses.  Fix all the easy way.

The check is '<= 1' to have the same result for 'len = 0' cache anywhere
in the page.  (nr_pages_needed is 0 on page boundary.)

Fixes: 8f964525a1 ("KVM: Allow cross page reads and writes from cached translations.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Message-Id: <20150408121648.GA3519@potion.brq.redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-10 16:04:45 +02:00
Paolo Bonzini
3180a7fcbc KVM: remove kvm_read_hva and kvm_read_hva_atomic
The corresponding write functions just use __copy_to_user.  Do the
same on the read side.

This reverts what's left of commit 86ab8cffb4 (KVM: introduce
gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21)

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
2015-04-08 10:46:54 +02:00
Paolo Bonzini
7f22b45d66 Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Features and fixes for 4.1 (kvm/next)

1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts

2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
    and introspection
2015-04-07 18:10:03 +02:00
Paolo Bonzini
bf0fb67cf9 Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next'
KVM/ARM changes for v4.1:

- fixes for live migration
- irqfd support
- kvm-io-bus & vgic rework to enable ioeventfd
- page ageing for stage-2 translation
- various cleanups
2015-04-07 18:09:20 +02:00
Paolo Bonzini
8999602d08 Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next'
Fixes for KVM/ARM for 4.0-rc5.

Fixes page refcounting issues in our Stage-2 page table management code,
fixes a missing unlock in a gicv3 error path, and fixes a race that can
cause lost interrupts if signals are pending just prior to entering the
guest.
2015-04-07 18:06:01 +02:00
Jens Freimann
47b43c52ee KVM: s390: add ioctl to inject local interrupts
We have introduced struct kvm_s390_irq a while ago which allows to
inject all kinds of interrupts as defined in the Principles of
Operation.
Add ioctl to inject interrupts with the extended struct kvm_s390_irq

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-03-31 21:07:30 +02:00
Andre Przywara
950324ab81 KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
data to be passed on from syndrome decoding all the way down to the
VGIC register handlers. Now as we switch the MMIO handling to be
routed through the KVM MMIO bus, it does not make sense anymore to
use that structure already from the beginning. So we keep the data in
local variables until we put them into the kvm_io_bus framework.
Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
structure. On that way we replace the data buffer in that structure
with a pointer pointing to a single location in a local variable, so
we get rid of some copying on the way.
With all of the virtual GIC emulation code now being registered with
the kvm_io_bus, we can remove all of the old MMIO handling code and
its dispatching functionality.

I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
because that touches a lot of code lines without any good reason.

This is based on an original patch by Nikolay.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-30 17:07:19 +01:00
Andre Przywara
fb8f61abab KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling
Using the framework provided by the recent vgic.c changes, we
register a kvm_io_bus device on mapping the virtual GICv3 resources.
The distributor mapping is pretty straight forward, but the
redistributors need some more love, since they need to be tagged with
the respective redistributor (read: VCPU) they are connected with.
We use the kvm_io_bus framework to register one devices per VCPU.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-30 17:07:13 +01:00
Andre Przywara
0ba10d5392 KVM: arm/arm64: merge GICv3 RD_base and SGI_base register frames
Currently we handle the redistributor registers in two separate MMIO
regions, one for the overall behaviour and SPIs and one for the
SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus
devices for each redistributor.
Since the spec mandates those two pages to be contigious, we could as
well merge them and save the churn with the second KVM I/O bus device.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-30 17:07:08 +01:00
Andre Przywara
a9cf86f62b KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_bus
Using the framework provided by the recent vgic.c changes we register
a kvm_io_bus device when initializing the virtual GICv2.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:15 +00:00
Andre Przywara
6777f77f0f KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC
Currently we use a lot of VGIC specific code to do the MMIO
dispatching.
Use the previous reworks to add kvm_io_bus style MMIO handlers.

Those are not yet called by the MMIO abort handler, also the actual
VGIC emulator function do not make use of it yet, but will be enabled
with the following patches.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:14 +00:00
Andre Przywara
9f199d0a0e KVM: arm/arm64: simplify vgic_find_range() and callers
The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio
argument, but actually only used the length field in there. Since we
need to get rid of that structure in that part of the code anyway,
let's rework the function (and it's callers) to pass the length
argument to the function directly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:14 +00:00
Andre Przywara
cf50a1eb43 KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_range
The name "kvm_mmio_range" is a bit bold, given that it only covers
the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename
it to vgic_io_range.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:14 +00:00
Andre Przywara
af669ac6dc KVM: move iodev.h from virt/kvm/ to include/kvm
iodev.h contains definitions for the kvm_io_bus framework. This is
needed both by the generic KVM code in virt/kvm as well as by
architecture specific code under arch/. Putting the header file in
virt/kvm and using local includes in the architecture part seems at
least dodgy to me, so let's move the file into include/kvm, so that a
more natural "#include <kvm/iodev.h>" can be used by all of the code.
This also solves a problem later when using struct kvm_io_device
in arm_vgic.h.
Fixing up the FSF address in the GPL header and a wrong include path
on the way.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:12 +00:00
Nikolay Nikolaev
e32edf4fd0 KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.

Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:11 +00:00
Igor Mammedov
744961341d kvm: avoid page allocation failure in kvm_set_memory_region()
KVM guest can fail to startup with following trace on host:

qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
  dump_stack+0x47/0x67
  warn_alloc_failed+0xee/0x150
  __alloc_pages_direct_compact+0x14a/0x150
  __alloc_pages_nodemask+0x776/0xb80
  alloc_kmem_pages+0x3a/0x110
  kmalloc_order+0x13/0x50
  kmemdup+0x1b/0x40
  __kvm_set_memory_region+0x24a/0x9f0 [kvm]
  kvm_set_ioapic+0x130/0x130 [kvm]
  kvm_set_memory_region+0x21/0x40 [kvm]
  kvm_vm_ioctl+0x43f/0x750 [kvm]

Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23 21:23:44 -03:00
Takuya Yoshikawa
58d2930f4e KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()
When all bits in mask are not set,
kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do.  But since
it needs to be called from the generic code, it cannot be inlined, and
a few function calls, two when PML is enabled, are wasted.

Since it is common to see many pages remain clean, e.g. framebuffers can
stay calm for a long time, it is worth eliminating this overhead.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-18 22:23:33 -03:00
Marcelo Tosatti
f710a12d73 Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
Fixes for KVM/ARM for 4.0-rc5.

Fixes page refcounting issues in our Stage-2 page table management code,
fixes a missing unlock in a gicv3 error path, and fixes a race that can
cause lost interrupts if signals are pending just prior to entering the
guest.
2015-03-16 20:08:56 -03:00
Christoffer Dall
1a74847885 arm/arm64: KVM: Fix migration race in the arch timer
When a VCPU is no longer running, we currently check to see if it has a
timer scheduled in the future, and if it does, we schedule a host
hrtimer to notify is in case the timer expires while the VCPU is still
not running.  When the hrtimer fires, we mask the guest's timer and
inject the timer IRQ (still relying on the guest unmasking the time when
it receives the IRQ).

This is all good and fine, but when migration a VM (checkpoint/restore)
this introduces a race.  It is unlikely, but possible, for the following
sequence of events to happen:

 1. Userspace stops the VM
 2. Hrtimer for VCPU is scheduled
 3. Userspace checkpoints the VGIC state (no pending timer interrupts)
 4. The hrtimer fires, schedules work in a workqueue
 5. Workqueue function runs, masks the timer and injects timer interrupt
 6. Userspace checkpoints the timer state (timer masked)

At restore time, you end up with a masked timer without any timer
interrupts and your guest halts never receiving timer interrupts.

Fix this by only kicking the VCPU in the workqueue function, and sample
the expired state of the timer when entering the guest again and inject
the interrupt and mask the timer only then.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14 13:48:00 +01:00
Christoffer Dall
47a98b15ba arm/arm64: KVM: support for un-queuing active IRQs
Migrating active interrupts causes the active state to be lost
completely. This implements some additional bitmaps to track the active
state on the distributor and export this to user space.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14 13:46:44 +01:00