Commit Graph

152 Commits

Author SHA1 Message Date
Gleb Natapov
b4a2f5e723 KVM: Avoid redelivery of edge interrupt before next edge
The check for an edge is broken in current ioapic code. ioapic->irr is
cleared on each edge interrupt by ioapic_service() and this makes
old_irr != ioapic->irr condition in kvm_ioapic_set_irq() to be always
true. The patch fixes the code to properly recognise edge.

Some HW emulation calls set_irq() without level change. If each such
call is propagated to an OS it may confuse a device driver. This is the
case with keyboard device emulation and Windows XP x64  installer on SMP VM.
Each keystroke produce two interrupts (down/up) one interrupt is
submitted to CPU0 and another to CPU1. This confuses Windows somehow
and it ignores keystrokes.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-09 11:45:49 +03:00
Michael S. Tsirkin
5116d8f6b9 KVM: fix ack not being delivered when msi present
kvm_notify_acked_irq does not check irq type, so that it sometimes
interprets msi vector as irq.  As a result, ack notifiers are not
called, which typially hangs the guest.  The fix is to track and
check irq type.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05 14:03:43 +03:00
Marcelo Tosatti
84261923d3 KVM: protect concurrent make_all_cpus_request
make_all_cpus_request contains a race condition which can
trigger false request completed status, as follows:

CPU0                                              CPU1

if (test_and_set_bit(req,&vcpu->requests))
   ....                                        	   if (test_and_set_bit(req,&vcpu->requests))
   ..                                                  return
proceed to smp_call_function_many(wait=1)

Use a spinlock to serialize concurrent CPUs.

Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28 14:10:29 +03:00
Izik Eidus
e244584fe3 KVM: Fix dirty bit tracking for slots with large pages
When slot is already allocated and being asked to be tracked we need
to break the large pages.

This code flush the mmu when someone ask a slot to start dirty bit
tracking.

Cc: stable@kernel.org
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28 14:10:29 +03:00
Yinghai Lu
aee74f3bb3 kvm: remove the duplicated cpumask_clear
zalloc_cpumask_var already cleared it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11 20:04:37 -07:00
Avi Kivity
09f8ca74ae KVM: Prevent overflow in largepages calculation
If userspace specifies a memory slot that is larger than 8 petabytes, it
could overflow the largepages variable.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 15:18:16 +03:00
Avi Kivity
ac04527f79 KVM: Disable large pages on misaligned memory slots
If a slots guest physical address and host virtual address unequal (mod
large page size), then we would erronously try to back guest large pages
with host large pages.  Detect this misalignment and diable large page
support for the trouble slot.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 15:17:58 +03:00
Marcelo Tosatti
b43b1901ad KVM: take mmu_lock when updating a deleted slot
kvm_handle_hva relies on mmu_lock protection to safely access
the memslot structures.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:54 +03:00
Marcelo Tosatti
547de29e5b KVM: protect assigned dev workqueue, int handler and irq acker
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
interrupt handler function. It does:

        if (dev->host_irq_disabled) {
                enable_irq(dev->host_irq);
                dev->host_irq_disabled = false;
        }

If an interrupt triggers before the host->dev_irq_disabled assignment,
it will disable the interrupt and set dev->host_irq_disabled to true.

On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to
false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
it.

Other than that, having the interrupt handler and work handlers run in
parallel sounds like asking for trouble (could not spot any obvious
problem, but better not have to, its fragile).

CC: sheng.yang@intel.com
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:53 +03:00
Chris Wright
efbc100c20 KVM: Trivial format fix in setup_routing_entry()
Remove extra tab.

Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:50 +03:00
Sheng Yang
8e1c18157d KVM: VMX: Disable VMX when system shutdown
Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work
when system shutdown.

CC: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:50 +03:00
Sheng Yang
522c68c441 KVM: Enable snooping control for supported hardware
Memory aliases with different memory type is a problem for guest. For the guest
without assigned device, the memory type of guest memory would always been the
same as host(WB); but for the assigned device, some part of memory may be used
as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of
host memory type then be a potential issue.

Snooping control can guarantee the cache correctness of memory go through the
DMA engine of VT-d.

[avi: fix build on ia64]

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:50 +03:00
Gleb Natapov
78646121e9 KVM: Fix interrupt unhalting a vcpu when it shouldn't
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:33 +03:00
Gleb Natapov
09cec75488 KVM: Timer event should not unconditionally unhalt vcpu.
Currently timer events are processed before entering guest mode. Move it
to main vcpu event loop since timer events should be processed even while
vcpu is halted.  Timer may cause interrupt/nmi to be injected and only then
vcpu will be unhalted.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:33 +03:00
Gleb Natapov
f00be0cae4 KVM: MMU: do not free active mmu pages in free_mmu_pages()
free_mmu_pages() should only undo what alloc_mmu_pages() does.
Free mmu pages from the generic VM destruction function, kvm_destroy_vm().

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:30 +03:00
Sheng Yang
e56d532f20 KVM: Device assignment framework rework
After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:

1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.

2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.

This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).

[avi: replace homemade bitcount() by hweight_long()]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:29 +03:00
Gleb Natapov
58c2dde17d KVM: APIC: get rid of deliver_bitmask
Deliver interrupt during destination matching loop.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:27 +03:00
Gleb Natapov
e1035715ef KVM: change the way how lowest priority vcpu is calculated
The new way does not require additional loop over vcpus to calculate
the one with lowest priority as one is chosen during delivery bitmap
construction.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:27 +03:00
Gleb Natapov
343f94fe4d KVM: consolidate ioapic/ipi interrupt delivery logic
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
apic_send_ipi() to figure out ipi destination instead of reimplementing
the logic.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:27 +03:00
Gleb Natapov
a53c17d21c KVM: ioapic/msi interrupt delivery consolidation
ioapic_deliver() and kvm_set_msi() have code duplication. Move
the code into ioapic_deliver_entry() function and call it from
both places.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:27 +03:00
Gleb Natapov
6da7e3f643 KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts
Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:26 +03:00
Sheng Yang
74a3a8f152 KVM: Merge kvm_ioapic_get_delivery_bitmask into kvm_get_intr_delivery_bitmask
Gleb fixed bitmap ops usage in kvm_ioapic_get_delivery_bitmask.

Sheng merged two functions, as well as fixed several issues in
kvm_get_intr_delivery_bitmask
1. deliver_bitmask is a bitmap rather than a unsigned long intereger.
2. Lowest priority target bitmap wrong calculated by mistake.
3. Prevent potential NULL reference.
4. Declaration in include/kvm_host.h caused powerpc compilation warning.
5. Add warning for guest broadcast interrupt with lowest priority delivery mode.
6. Removed duplicate bitmap clean up in caller of kvm_get_intr_delivery_bitmask.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-06-10 11:48:26 +03:00
Sheng Yang
d510d6cc65 KVM: Enable MSI-X for KVM assigned device
This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.

2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:23 +03:00
Sheng Yang
2350bd1f62 KVM: Add MSI-X interrupt injection logic
We have to handle more than one interrupt with one handler for MSI-X. Avi
suggested to use a flag to indicate the pending. So here is it.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:23 +03:00
Sheng Yang
c1e0151429 KVM: Ioctls for init MSI-X entry
Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.

This two ioctls are used by userspace to specific guest device MSI-X entry
number and correlate MSI-X entry with GSI during the initialization stage.

MSI-X should be well initialzed before enabling.

Don't support change MSI-X entry number for now.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:23 +03:00