Commit Graph

872 Commits

Author SHA1 Message Date
Alex Bennée 71760950bf arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn
This helps re-factor away some of the repetitive code and makes the code
flow more nicely.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14 13:44:52 +01:00
Christoffer Dall ae705930fc arm/arm64: KVM: Keep elrsr/aisr in sync with software model
There is an interesting bug in the vgic code, which manifests itself
when the KVM run loop has a signal pending or needs a vmid generation
rollover after having disabled interrupts but before actually switching
to the guest.

In this case, we flush the vgic as usual, but we sync back the vgic
state and exit to userspace before entering the guest.  The consequence
is that we will be syncing the list registers back to the software model
using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
potentially overwriting a list register containing an interrupt.

This showed up during migration testing where we would capture a state
where the VM has masked the arch timer but there were no interrupts,
resulting in a hung test.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Alex Bennee <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14 13:42:07 +01:00
Wei Yongjun b52104e509 arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create()
Add the missing unlock before return from function kvm_vgic_create()
in the error handling case.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-13 11:40:57 +01:00
Eric Auger 174178fed3 KVM: arm/arm64: add irqfd support
This patch enables irqfd on arm/arm64.

Both irqfd and resamplefd are supported. Injection is implemented
in vgic.c without routing.

This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.

Irqfd injection is restricted to SPI. The rationale behind not
supporting PPI irqfd injection is that any device using a PPI would
be a private-to-the-CPU device (timer for instance), so its state
would have to be context-switched along with the VCPU and would
require in-kernel wiring anyhow. It is not a relevant use case for
irqfds.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12 15:15:34 +01:00
Eric Auger 649cf73994 KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstate
To prepare for irqfd addition, coarse grain locking is removed at
kvm_vgic_sync_hwstate level and finer grain locking is introduced in
vgic_process_maintenance only.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12 15:15:33 +01:00
Eric Auger 01c94e64f5 KVM: introduce kvm_arch_intc_initialized and use it in irqfd
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and
associated kvm_arch_intc_initialized function. This latter
allows to test whether the virtual interrupt controller is initialized
and ready to accept virtual IRQ injection. On some architectures,
the virtual interrupt controller is dynamically instantiated, justifying
that kind of check.

The new function can now be used by irqfd to check whether the
virtual interrupt controller is ready on KVM_IRQFD request. If not,
KVM_IRQFD returns -EAGAIN.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12 15:15:32 +01:00
Mark Rutland 0f37247574 KVM: vgic: add virt-capable compatible strings
Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC
compatible list, and while this is correct (and supported by the GIC
driver), KVM will fail to detect that it can support these cases.

This patch adds the missing strings to the VGIC code. The of_device_id
entries are padded to keep the probe function data aligned.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-11 15:27:48 +01:00
Paolo Bonzini dc9be0fac7 kvm: move advertising of KVM_CAP_IRQFD to common code
POWER supports irqfds but forgot to advertise them.  Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.

To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.

Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e21053a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 21:18:59 -03:00
Xiubo Li 1170adc6dd KVM: Use pr_info/pr_err in kvm_main.c
WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then
dev_info(dev, ... then pr_info(...  to printk(KERN_INFO ...
+   printk(KERN_INFO "kvm: exiting hardware virtualization\n");

WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then
dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
+	printk(KERN_ERR "kvm: misc device register failed\n");

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:45 -03:00
Xiubo Li 20e87b7224 KVM: Fix indentation in kvm_main.c
ERROR: code indent should use tabs where possible
+                                 const struct kvm_io_range *r2)$

WARNING: please, no spaces at the start of a line
+                                 const struct kvm_io_range *r2)$

This patch fixes this ERROR & WARNING to reduce noise when checking new
patches in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:44 -03:00
Xiubo Li b7d409deb9 KVM: no space before tabs in kvm_main.c
WARNING: please, no space before tabs
+ * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$

WARNING: please, no space before tabs
+^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$

WARNING: please, no space before tabs
+^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$

This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:44 -03:00
Xiubo Li f95ef0cd02 KVM: Missing blank line after declarations in kvm_main.c
There are many Warnings like this:
WARNING: Missing a blank line after declarations
+	struct kvm_coalesced_mmio_zone zone;
+	r = -EFAULT;

This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:44 -03:00
Xiubo Li ee543159d5 KVM: EXPORT_SYMBOL should immediately follow its function
WARNING: EXPORT_SYMBOL(foo); should immediately follow its
function/variable
+EXPORT_SYMBOL_GPL(gfn_to_page);

This patch fixes these warnings to reduce noise when checking new
patches in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:44 -03:00
Xiubo Li f4fee93270 KVM: Fix ERROR: do not initialise statics to 0 or NULL in kvm_main.c
ERROR: do not initialise statics to 0 or NULL
+static int kvm_usage_count = 0;

The kvm_usage_count will be placed to .bss segment when linking, so
not need to set it to 0 here obviously.

This patch fixes this ERROR to reduce noise when checking new patches
in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:44 -03:00
Xiubo Li a642a17567 KVM: Fix WARNING: labels should not be indented in kvm_main.c
WARNING: labels should not be indented
+   out_free_irq_routing:

This patch fixes this WARNING to reduce noise when checking new patches
in kvm_main.c.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:43 -03:00
Xiubo Li 893bdbf165 KVM: Fix WARNINGs for 'sizeof(X)' instead of 'sizeof X' in kvm_main.c
There are many WARNINGs like this:
WARNING: sizeof tr should be sizeof(tr)
+	if (copy_from_user(&tr, argp, sizeof tr))

In kvm_main.c many places are using 'sizeof(X)', and the other places
are using 'sizeof X', while the kernel recommands to use 'sizeof(X)',
so this patch will replace all 'sizeof X' to 'sizeof(X)' to make them
consistent and at the same time to reduce the WARNINGs noise when we
are checking new patches.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:43 -03:00
Thomas Huth 548ef28449 KVM: Get rid of kvm_kvfree()
kvm_kvfree() provides exactly the same functionality as the
new common kvfree() function - so let's simply replace the
kvm function with the common function.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:43 -03:00
Christian Borntraeger 0fa9778895 KVM: make halt_poll_ns static
halt_poll_ns is used only locally. Make it static.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:43 -03:00
Kevin Mulvey ae548c5c80 KVM: fix checkpatch.pl errors in kvm/irqchip.c
Fix whitespace around while

Signed-off-by: Kevin Mulvey <kmulvey@linux.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:42 -03:00
Kevin Mulvey bfda0e8491 KVM: white space formatting in kvm_main.c
Better alignment of loop using tabs rather than spaces, this
makes checkpatch.pl happier.

Signed-off-by: Kevin Mulvey <kmulvey@linux.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 10:37:42 -03:00
Linus Torvalds b9085bcbf5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
 "Fairly small update, but there are some interesting new features.

  Common:
     Optional support for adding a small amount of polling on each HLT
     instruction executed in the guest (or equivalent for other
     architectures).  This can improve latency up to 50% on some
     scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests).  This
     also has to be enabled manually for now, but the plan is to
     auto-tune this in the future.

  ARM/ARM64:
     The highlights are support for GICv3 emulation and dirty page
     tracking

  s390:
     Several optimizations and bugfixes.  Also a first: a feature
     exposed by KVM (UUID and long guest name in /proc/sysinfo) before
     it is available in IBM's hypervisor! :)

  MIPS:
     Bugfixes.

  x86:
     Support for PML (page modification logging, a new feature in
     Broadwell Xeons that speeds up dirty page tracking), nested
     virtualization improvements (nested APICv---a nice optimization),
     usual round of emulation fixes.

     There is also a new option to reduce latency of the TSC deadline
     timer in the guest; this needs to be tuned manually.

     Some commits are common between this pull and Catalin's; I see you
     have already included his tree.

  Powerpc:
     Nothing yet.

     The KVM/PPC changes will come in through the PPC maintainers,
     because I haven't received them yet and I might end up being
     offline for some part of next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: ia64: drop kvm.h from installed user headers
  KVM: x86: fix build with !CONFIG_SMP
  KVM: x86: emulate: correct page fault error code for NoWrite instructions
  KVM: Disable compat ioctl for s390
  KVM: s390: add cpu model support
  KVM: s390: use facilities and cpu_id per KVM
  KVM: s390/CPACF: Choose crypto control block format
  s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
  KVM: s390: reenable LPP facility
  KVM: s390: floating irqs: fix user triggerable endless loop
  kvm: add halt_poll_ns module parameter
  kvm: remove KVM_MMIO_SIZE
  KVM: MIPS: Don't leak FPU/DSP to guest
  KVM: MIPS: Disable HTW while in guest
  KVM: nVMX: Enable nested posted interrupt processing
  KVM: nVMX: Enable nested virtual interrupt delivery
  KVM: nVMX: Enable nested apic register virtualization
  KVM: nVMX: Make nested control MSRs per-cpu
  KVM: nVMX: Enable nested virtualize x2apic mode
  KVM: nVMX: Prepare for using hardware MSR bitmap
  ...
2015-02-13 09:55:09 -08:00
Andrea Arcangeli 0664e57ff0 mm: gup: kvm use get_user_pages_unlocked
Use the more generic get_user_pages_unlocked which has the additional
benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
(which allows the first page fault in an unmapped area to be always able
to block indefinitely by being allowed to release the mmap_sem).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Christian Borntraeger de8e5d7440 KVM: Disable compat ioctl for s390
We never had a 31bit QEMU/kuli running. We would need to review several
ioctls to check if this creates holes, bugs or whatever to make it work.
Lets just disable compat support for KVM on s390.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-09 12:44:14 +01:00
Paolo Bonzini f781951299 kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest.  KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too.  When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest.  This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host.  The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter.  It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls.  During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark.  The wasted time is thus very low.  Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer.  Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06 13:08:37 +01:00
Kai Huang 3b0f1d01e5 KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:30:38 +01:00