Commit Graph

937 Commits

Author SHA1 Message Date
Wanpeng Li 2cbd78244f KVM: trace kvm_halt_poll_ns grow/shrink
Tracepoint for dynamic halt_pool_ns, fired on every potential change.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06 16:33:14 +02:00
Wanpeng Li aca6ff29c4 KVM: dynamic halt-polling
There is a downside of always-poll since poll is still happened for idle
vCPUs which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected,
and to shrink halt_poll_ns when long halt is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.

                        no-poll      always-poll    dynamic-poll
-----------------------------------------------------------------------
Idle (nohz) vCPU %c0     0.15%        0.3%            0.2%
Idle (250HZ) vCPU %c0    1.1%         4.6%~14%        1.2%
TCP_RR latency           34us         27us            26.7us

"Idle (X) vCPU %c0" is the percent of time the physical cpu spent in
c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the
guest was tickless. (250HZ) means the guest was ticking at 250HZ.

The big win is with ticking operating systems. Running the linux guest
with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close
to no-polling overhead levels by using the dynamic-poll. The savings
should be even higher for higher frequency ticks.

Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Simplify the patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06 16:32:43 +02:00
Wanpeng Li 19020f8ab8 KVM: make halt_poll_ns per-vCPU
Change halt_poll_ns into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06 16:27:10 +02:00
Christoffer Dall 4ad9e16af3 arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
Provide a better quality of implementation and be architecture compliant
on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
reset of the timer.

This change alone fixes the UEFI reset issue reported by Laszlo back in
February.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Drew Jones <drjones@redhat.com>
Cc: Wei Huang <wei@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04 16:26:56 +01:00
Christoffer Dall 04bdfa8ab5 arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
We currently set the physical active state only when we *inject* a new
pending virtual interrupt, but this is actually not correct, because we
could have been preempted and run something else on the system that
resets the active state to clear.  This causes us to run the VM with the
timer set to fire, but without setting the physical active state.

The solution is to always check the LR configurations, and we if have a
mapped interrupt in the LR in either the pending or active state
(virtual), then set the physical active state.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04 16:26:52 +01:00
Paolo Bonzini e3dbc572fe Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue
Patch queue for ppc - 2015-08-22

Highlights for KVM PPC this time around:

  - Book3S: A few bug fixes
  - Book3S: Allow micro-threading on POWER8
2015-08-22 14:57:59 -07:00
Marc Zyngier f120cd6533 KVM: arm/arm64: timer: Allow the timer to control the active state
In order to remove the crude hack where we sneak the masked bit
into the timer's control register, make use of the phys_irq_map
API control the active state of the interrupt.

This causes some limited changes to allow for potential error
propagation.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:26 +01:00
Marc Zyngier 773299a570 KVM: arm/arm64: vgic: Prevent userspace injection of a mapped interrupt
Virtual interrupts mapped to a HW interrupt should only be triggered
from inside the kernel. Otherwise, you could end up confusing the
kernel (and the GIC's) state machine.

Rearrange the injection path so that kvm_vgic_inject_irq is
used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is
used for mapped interrupts. The latter should only be called from
inside the kernel (timer, irqfd).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:26 +01:00
Marc Zyngier 6e84e0e067 KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active
In order to control the active state of an interrupt, introduce
a pair of accessors allowing the state to be set/queried.

This only affects the logical state, and the HW state will only be
applied at world-switch time.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:26 +01:00
Marc Zyngier 08fd6461e8 KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest
To allow a HW interrupt to be injected into a guest, we lookup the
guest virtual interrupt in the irq_phys_map list, and if we have
a match, encode both interrupts in the LR.

We also mark the interrupt as "active" at the host distributor level.

On guest EOI on the virtual interrupt, the host interrupt will be
deactivated.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:25 +01:00
Marc Zyngier 6c3d63c9a2 KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts
In order to be able to feed physical interrupts to a guest, we need
to be able to establish the virtual-physical mapping between the two
worlds.

The mappings are kept in a set of RCU lists, indexed by virtual interrupts.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:25 +01:00
Marc Zyngier 7a67b4b7e0 KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
We only set the irq_queued flag for level interrupts, meaning
that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
for all interrupts.

This will allow us to inject edge HW interrupts, for which the
state ACTIVE+PENDING is not allowed.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:25 +01:00
Marc Zyngier fb182cf845 KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR
Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
field, we can encode that information into the list registers.

This patch provides implementations for both GICv2 and GICv3.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12 11:28:24 +01:00
Paolo Bonzini dd489240a2 KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-30 16:02:54 +02:00
Paolo Bonzini d71ba78834 KVM: move code related to KVM_SET_BOOT_CPU_ID to x86
This is another remnant of ia64 support.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-29 14:27:21 +02:00
Paolo Bonzini 5544eb9b81 KVM: count number of assigned devices
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10 13:25:26 +02:00
Peter Zijlstra 2ecd9d29ab sched, preempt_notifier: separate notifier registration from static_key inc/dec
Commit 1cde2930e1 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems.  First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state.  KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.

Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put).  With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.

To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.

Fixes: 1cde2930e1 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-03 18:55:00 +02:00
Linus Torvalds e3d8238d7f Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "Mostly refactoring/clean-up:

   - CPU ops and PSCI (Power State Coordination Interface) refactoring
     following the merging of the arm64 ACPI support, together with
     handling of Trusted (secure) OS instances

   - Using fixmap for permanent FDT mapping, removing the initial dtb
     placement requirements (within 512MB from the start of the kernel
     image).  This required moving the FDT self reservation out of the
     memreserve processing

   - Idmap (1:1 mapping used for MMU on/off) handling clean-up

   - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
     Last stages of CPU power down/up are handled by firmware already

   - "Alternatives" (run-time code patching) refactoring and support for
     immediate branch patching, GICv3 CPU interface access

   - User faults handling clean-up

  And some fixes:

   - Fix for VDSO building with broken ELF toolchains

   - Fix another case of init_mm.pgd usage for user mappings (during
     ASID roll-over broadcasting)

   - Fix for FPSIMD reloading after CPU hotplug

   - Fix for missing syscall trace exit

   - Workaround for .inst asm bug

   - Compat fix for switching the user tls tpidr_el0 register"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: use private ratelimit state along with show_unhandled_signals
  arm64: show unhandled SP/PC alignment faults
  arm64: vdso: work-around broken ELF toolchains in Makefile
  arm64: kernel: rename __cpu_suspend to keep it aligned with arm
  arm64: compat: print compat_sp instead of sp
  arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
  arm64: entry: fix context tracking for el0_sp_pc
  arm64: defconfig: enable memtest
  arm64: mm: remove reference to tlb.S from comment block
  arm64: Do not attempt to use init_mm in reset_context()
  arm64: KVM: Switch vgic save/restore to alternative_insn
  arm64: alternative: Introduce feature for GICv3 CPU interface
  arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
  arm64: fix bug for reloading FPSIMD state after CPU hotplug.
  arm64: kernel thread don't need to save fpsimd context.
  arm64: fix missing syscall trace exit
  arm64: alternative: Work around .inst assembler bugs
  arm64: alternative: Merge alternative-asm.h into alternative.h
  arm64: alternative: Allow immediate branch as alternative instruction
  arm64: Rework alternate sequence for ARM erratum 845719
  ...
2015-06-24 10:02:15 -07:00
Kevin Mulvey 0b8ba4a2b6 KVM: fix checkpatch.pl errors in kvm/coalesced_mmio.h
Tabs rather than spaces

Signed-off-by: Kevin Mulvey <kmulvey@linux.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Kevin Mulvey d626f3d5b3 KVM: fix checkpatch.pl errors in kvm/async_pf.h
fix brace spacing

Signed-off-by: Kevin Mulvey <kmulvey@linux.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:25 +02:00
Joerg Roedel e73f61e41f kvm: irqchip: Break up high order allocations of kvm_irq_routing_table
The allocation size of the kvm_irq_routing_table depends on
the number of irq routing entries because they are all
allocated with one kzalloc call.

When the irq routing table gets bigger this requires high
order allocations which fail from time to time:

	qemu-kvm: page allocation failure: order:4, mode:0xd0

This patch fixes this issue by breaking up the allocation of
the table and its entries into individual kzalloc calls.
These could all be satisfied with order-0 allocations, which
are less likely to fail.

The downside of this change is the lower performance, because
of more calls to kzalloc. But given how often kvm_set_irq_routing
is called in the lifetime of a guest, it doesn't really
matter much.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[Avoid sparse warning through rcu_access_pointer. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:25 +02:00
Paolo Bonzini 05fe125fa3 Merge tag 'kvm-arm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM changes for v4.2:

- Proper guest time accounting
- FP access fix for 32bit
- The usual pile of GIC fixes
- PSCI fixes
- Random cleanups
2015-06-19 17:15:24 +02:00
Marc Zyngier c62e631d4a KVM: arm/arm64: vgic: Remove useless arm-gic.h #include
Back in the days, vgic.c used to have an intimate knowledge of
the actual GICv2. These days, this has been abstracted away into
hardware-specific backends.

Remove the now useless arm-gic.h #include directive, making it
clear that GICv2 specific code doesn't belong here.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-18 15:50:31 +01:00
Marc Zyngier 4839ddc27b KVM: arm/arm64: vgic: Avoid injecting reserved IRQ numbers
Commit fd1d0ddf2a (KVM: arm/arm64: check IRQ number on userland
injection) rightly limited the range of interrupts userspace can
inject in a guest, but failed to consider the (unlikely) case where
a guest is configured with 1024 interrupts.

In this case, interrupts ranging from 1020 to 1023 are unuseable,
as they have a special meaning for the GIC CPU interface.

Make sure that these number cannot be used as an IRQ. Also delete
a redundant (and similarily buggy) check in kvm_set_irq.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 15:18:59 +01:00
Marc Zyngier f5a202db12 KVM: arm: vgic: Drop useless Group0 warning
If a GICv3-enabled guest tries to configure Group0, we print a
warning on the console (because we don't support Group0 interrupts).

This is fairly pointless, and would allow a guest to spam the
console. Let's just drop the warning.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:58:12 +01:00